Data Bandwidth
Index
GPUInspector.host2device_bandwidthGPUInspector.memory_bandwidthGPUInspector.memory_bandwidth_saxpyGPUInspector.memory_bandwidth_saxpy_scalingGPUInspector.memory_bandwidth_scalingGPUInspector.p2p_bandwidthGPUInspector.p2p_bandwidth_allGPUInspector.p2p_bandwidth_bidirectionalGPUInspector.p2p_bandwidth_bidirectional_allGPUInspector.theoretical_memory_bandwidth
References
GPUInspector.memory_bandwidth — Functionmemory_bandwidth([memsize; kwargs...])Tries to estimate the peak memory bandwidth of a GPU in GiB/s by measuring the time it takes to perform a memcpy of a certain amount of data (as specified by memsize).
Keyword arguments:
device(default:CUDA.device()): CUDA device to be used.dtype(default:Cchar): element type of the vectors.verbose(default:true): toggle printing.
See also: memory_bandwidth_scaling.
GPUInspector.memory_bandwidth_scaling — Methodmemory_bandwidth_scaling() -> datasizes, bandwidthsMeasures the memory bandwidth (via memory_bandwidth) as a function of data size. If verbose=true (default), displays a unicode plot. Returns the considered data sizes and GiB/s. For further options, see memory_bandwidth.
GPUInspector.theoretical_memory_bandwidth — Functiontheoretical_memory_bandwidth(; device::CuDevice=CUDA.device(); verbose=true)Estimates the theoretical maximal GPU memory bandwidth in GiB/s.
GPUInspector.memory_bandwidth_saxpy — MethodTries to estimate the peak memory bandwidth of a GPU in GiB/s by measuring the time it takes to perform a SAXPY, i.e. a * x[i] + y[i].
Keyword arguments:
device(default:CUDA.device()): CUDA device to be used.dtype(default:Float32): element type of the vectors.size(default:2^20 * 10): length of the vectors.nbench(default:5): number of measurements to be performed the best of which is used for the GiB/s computation.verbose(default:true): toggle printing.cublas(default:true): toggle betweenCUDA.axpy!and a customsaxpy_gpu_kernel!.
See also: memory_bandwidth_saxpy_scaling.
GPUInspector.memory_bandwidth_saxpy_scaling — Methodmemory_bandwidth_saxpy_scaling() -> sizes, bandwidthsMeasures the memory bandwidth (via memory_bandwidth_saxpy) as a function of vector length. If verbose=true (default), displays a unicode plot. Returns the considered lengths and GiB/s. For further options, see memory_bandwidth_saxpy.
GPUInspector.host2device_bandwidth — Functionhost2device_bandwidth([memsize::UnitPrefixedBytes=GiB(0.5)]; kwargs...)Performs a host-to-device memory copy benchmark (time measurement) and returns the host-to-device bandwidth estimate (in GiB/s) derived from it.
Keyword arguments:
nbench(default:10): number of time measurements (i.e. p2p memcopies)verbose(default:true): set to false to turn off any printing.stats(default:false): whentrueshows statistical information about the benchmark.times(default:false): toggle printing of measured times.dtype(default:Cchar): used data type.
Examples:
host2device_bandwidth()
host2device_bandwidth(MiB(1024))
host2device_bandwidth(KiB(20_000); dtype=Int32)GPUInspector.p2p_bandwidth — Functionp2p_bandwidth([memsize::UnitPrefixedBytes]; kwargs...)Performs a peer-to-peer memory copy benchmark (time measurement) and returns an inter-gpu memory bandwidth estimate (in GiB/s) derived from it.
Keyword arguments:
src(default:0): source devicedst(default:1): destination devicenbench(default:5): number of time measurements (i.e. p2p memcopies)verbose(default:true): set to false to turn off any printing.hist(default:false): whentrue, a UnicodePlots-based histogram is printed.times(default:false): toggle printing of measured times.alternate(default:false): alternatesrcanddst, i.e. copy data back and forth.dtype(default:Float32): seealloc_mem.
Examples:
p2p_bandwidth()
p2p_bandwidth(MiB(1024))
p2p_bandwidth(KiB(20_000); dtype=Int32)GPUInspector.p2p_bandwidth_all — Methodp2p_bandwidth_all(args...; kwargs...)Run p2p_bandwidth for all combinations of devices. Returns a matrix with the p2p memory bandwidth estimates.
GPUInspector.p2p_bandwidth_bidirectional — FunctionSame as p2p_bandwidth but measures the bidirectional bandwidth (copying data back and forth).
GPUInspector.p2p_bandwidth_bidirectional_all — MethodSame as p2p_bandwidth_all but measures the bidirectional bandwidth (copying data back and forth).