Data Bandwidth
Index
GPUInspector.host2device_bandwidthGPUInspector.memory_bandwidthGPUInspector.memory_bandwidth_saxpyGPUInspector.memory_bandwidth_saxpy_scalingGPUInspector.memory_bandwidth_scalingGPUInspector.p2p_bandwidthGPUInspector.p2p_bandwidth_allGPUInspector.p2p_bandwidth_bidirectionalGPUInspector.p2p_bandwidth_bidirectional_allGPUInspector.theoretical_memory_bandwidth
References
GPUInspector.memory_bandwidth — Methodmemory_bandwidth(; kwargs...)Tries to estimate the peak memory bandwidth of a GPU in GiB/s by measuring the time it takes to perform a memcpy of a certain amount of data (as specified by memsize).
Keyword arguments:
memsize(default:GiB(0.5)): memory size to be useddevice(default: e.g.CUDA.device()): GPU device to be used.dtype(default:Cchar): element type of the vectors.verbose(default:true): toggle printing.io(default:stdout): set the stream where the results should be printed.
See also: memory_bandwidth_scaling.
GPUInspector.memory_bandwidth_saxpy — MethodTries to estimate the peak memory bandwidth of a GPU in GiB/s by measuring the time it takes to perform a SAXPY, i.e. a * x[i] + y[i].
Keyword arguments:
device(default: e.g.CUDA.device()): GPU device to be used.dtype(default:Float32): element type of the vectors.size(default:2^20 * 10): length of the vectors.nbench(default:5): number of measurements to be performed the best of which is used for the GiB/s computation.verbose(default:true): toggle printing.io(default:stdout): set the stream where the results should be printed.
See also: memory_bandwidth_saxpy_scaling.
GPUInspector.memory_bandwidth_saxpy_scaling — Methodmemory_bandwidth_saxpy_scaling() -> sizes, bandwidthsMeasures the memory bandwidth (via memory_bandwidth_saxpy) as a function of vector length. If verbose=true (default), displays a unicode plot. Returns the considered lengths and GiB/s. For further options, see memory_bandwidth_saxpy.
GPUInspector.memory_bandwidth_scaling — Methodmemory_bandwidth_scaling() -> datasizes, bandwidthsMeasures the memory bandwidth (via memory_bandwidth) as a function of data size. If verbose=true (default), displays a unicode plot. Returns the considered data sizes and GiB/s. For further options, see memory_bandwidth.
GPUInspector.theoretical_memory_bandwidth — Methodtheoretical_memory_bandwidth(; device, verbose)Estimates the theoretical maximal GPU memory bandwidth in GiB/s.
GPUInspector.host2device_bandwidth — Methodhost2device_bandwidth(; kwargs...)Performs a host-to-device memory copy benchmark (time measurement) and returns the host-to-device bandwidth estimate (in GiB/s) derived from it.
Keyword arguments:
memsize(default:GiB(0.5)): memory size to be usednbench(default:10): number of time measurements (i.e. p2p memcopies)verbose(default:true): set to false to turn off any printing.stats(default:false): whentrueshows statistical information about the benchmark.times(default:false): toggle printing of measured times.dtype(default:Cchar): used data type.io(default:stdout): set the stream where the results should be printed.
Examples:
host2device_bandwidth()
host2device_bandwidth(MiB(1024))
host2device_bandwidth(KiB(20_000); dtype=Int32)GPUInspector.p2p_bandwidth — Methodp2p_bandwidth(; kwargs...) Performs a peer-to-peer memory copy benchmark (time measurement) and returns an inter-gpu memory bandwidth estimate (in GiB/s) derived from it.
Keyword arguments:
memsize(default:B(40_000_000)): memory size to be usedsrc(default:0): source devicedst(default:1): destination devicenbench(default:5): number of time measurements (i.e. p2p memcopies)verbose(default:true): set to false to turn off any printing.hist(default:false): whentrue, a UnicodePlots-based histogram is printed.times(default:false): toggle printing of measured times.alternate(default:false): alternatesrcanddst, i.e. copy data back and forth.dtype(default:Float32): data type to consider.io(default:stdout): set the stream where the results should be printed.
Examples:
p2p_bandwidth()
p2p_bandwidth(MiB(1024))
p2p_bandwidth(KiB(20_000); dtype=Int32)GPUInspector.p2p_bandwidth_all — Methodp2p_bandwidth_all(; kwargs...)Run p2p_bandwidth for all combinations of available devices. Returns a matrix with the p2p memory bandwidth estimates.
GPUInspector.p2p_bandwidth_bidirectional — MethodSame as p2p_bandwidth but measures the bidirectional bandwidth (copying data back and forth).
GPUInspector.p2p_bandwidth_bidirectional_all — MethodSame as p2p_bandwidth_all but measures the bidirectional bandwidth (copying data back and forth).