Data Bandwidth
Index
GPUInspector.host2device_bandwidth
GPUInspector.memory_bandwidth
GPUInspector.memory_bandwidth_saxpy
GPUInspector.memory_bandwidth_saxpy_scaling
GPUInspector.memory_bandwidth_scaling
GPUInspector.p2p_bandwidth
GPUInspector.p2p_bandwidth_all
GPUInspector.p2p_bandwidth_bidirectional
GPUInspector.p2p_bandwidth_bidirectional_all
GPUInspector.theoretical_memory_bandwidth
References
GPUInspector.memory_bandwidth
— Functionmemory_bandwidth([memsize; kwargs...])
Tries to estimate the peak memory bandwidth of a GPU in GiB/s by measuring the time it takes to perform a memcpy of a certain amount of data (as specified by memsize
).
Keyword arguments:
device
(default:CUDA.device()
): CUDA device to be used.dtype
(default:Cchar
): element type of the vectors.verbose
(default:true
): toggle printing.
See also: memory_bandwidth_scaling
.
GPUInspector.memory_bandwidth_scaling
— Methodmemory_bandwidth_scaling() -> datasizes, bandwidths
Measures the memory bandwidth (via memory_bandwidth
) as a function of data size. If verbose=true
(default), displays a unicode plot. Returns the considered data sizes and GiB/s. For further options, see memory_bandwidth
.
GPUInspector.theoretical_memory_bandwidth
— Functiontheoretical_memory_bandwidth(; device::CuDevice=CUDA.device(); verbose=true)
Estimates the theoretical maximal GPU memory bandwidth in GiB/s.
GPUInspector.memory_bandwidth_saxpy
— MethodTries to estimate the peak memory bandwidth of a GPU in GiB/s by measuring the time it takes to perform a SAXPY, i.e. a * x[i] + y[i]
.
Keyword arguments:
device
(default:CUDA.device()
): CUDA device to be used.dtype
(default:Float32
): element type of the vectors.size
(default:2^20 * 10
): length of the vectors.nbench
(default:5
): number of measurements to be performed the best of which is used for the GiB/s computation.verbose
(default:true
): toggle printing.cublas
(default:true
): toggle betweenCUDA.axpy!
and a customsaxpy_gpu_kernel!
.
See also: memory_bandwidth_saxpy_scaling
.
GPUInspector.memory_bandwidth_saxpy_scaling
— Methodmemory_bandwidth_saxpy_scaling() -> sizes, bandwidths
Measures the memory bandwidth (via memory_bandwidth_saxpy
) as a function of vector length. If verbose=true
(default), displays a unicode plot. Returns the considered lengths and GiB/s. For further options, see memory_bandwidth_saxpy
.
GPUInspector.host2device_bandwidth
— Functionhost2device_bandwidth([memsize::UnitPrefixedBytes=GiB(0.5)]; kwargs...)
Performs a host-to-device memory copy benchmark (time measurement) and returns the host-to-device bandwidth estimate (in GiB/s) derived from it.
Keyword arguments:
nbench
(default:10
): number of time measurements (i.e. p2p memcopies)verbose
(default:true
): set to false to turn off any printing.stats
(default:false
): whentrue
shows statistical information about the benchmark.times
(default:false
): toggle printing of measured times.dtype
(default:Cchar
): used data type.
Examples:
host2device_bandwidth()
host2device_bandwidth(MiB(1024))
host2device_bandwidth(KiB(20_000); dtype=Int32)
GPUInspector.p2p_bandwidth
— Functionp2p_bandwidth([memsize::UnitPrefixedBytes]; kwargs...)
Performs a peer-to-peer memory copy benchmark (time measurement) and returns an inter-gpu memory bandwidth estimate (in GiB/s) derived from it.
Keyword arguments:
src
(default:0
): source devicedst
(default:1
): destination devicenbench
(default:5
): number of time measurements (i.e. p2p memcopies)verbose
(default:true
): set to false to turn off any printing.hist
(default:false
): whentrue
, a UnicodePlots-based histogram is printed.times
(default:false
): toggle printing of measured times.alternate
(default:false
): alternatesrc
anddst
, i.e. copy data back and forth.dtype
(default:Float32
): seealloc_mem
.
Examples:
p2p_bandwidth()
p2p_bandwidth(MiB(1024))
p2p_bandwidth(KiB(20_000); dtype=Int32)
GPUInspector.p2p_bandwidth_all
— Methodp2p_bandwidth_all(args...; kwargs...)
Run p2p_bandwidth
for all combinations of devices. Returns a matrix with the p2p memory bandwidth estimates.
GPUInspector.p2p_bandwidth_bidirectional
— FunctionSame as p2p_bandwidth
but measures the bidirectional bandwidth (copying data back and forth).
GPUInspector.p2p_bandwidth_bidirectional_all
— MethodSame as p2p_bandwidth_all
but measures the bidirectional bandwidth (copying data back and forth).