Data Bandwidth
Index
GPUInspector.host2device_bandwidth
GPUInspector.memory_bandwidth
GPUInspector.memory_bandwidth_saxpy
GPUInspector.memory_bandwidth_saxpy_scaling
GPUInspector.memory_bandwidth_scaling
GPUInspector.p2p_bandwidth
GPUInspector.p2p_bandwidth_all
GPUInspector.p2p_bandwidth_bidirectional
GPUInspector.p2p_bandwidth_bidirectional_all
GPUInspector.theoretical_memory_bandwidth
References
GPUInspector.memory_bandwidth
— Methodmemory_bandwidth(; kwargs...)
Tries to estimate the peak memory bandwidth of a GPU in GiB/s by measuring the time it takes to perform a memcpy of a certain amount of data (as specified by memsize
).
Keyword arguments:
memsize
(default:GiB(0.5)
): memory size to be useddevice
(default: e.g.CUDA.device()
): GPU device to be used.dtype
(default:Cchar
): element type of the vectors.verbose
(default:true
): toggle printing.io
(default:stdout
): set the stream where the results should be printed.
See also: memory_bandwidth_scaling
.
GPUInspector.memory_bandwidth_saxpy
— MethodTries to estimate the peak memory bandwidth of a GPU in GiB/s by measuring the time it takes to perform a SAXPY, i.e. a * x[i] + y[i]
.
Keyword arguments:
device
(default: e.g.CUDA.device()
): GPU device to be used.dtype
(default:Float32
): element type of the vectors.size
(default:2^20 * 10
): length of the vectors.nbench
(default:5
): number of measurements to be performed the best of which is used for the GiB/s computation.verbose
(default:true
): toggle printing.io
(default:stdout
): set the stream where the results should be printed.
See also: memory_bandwidth_saxpy_scaling
.
GPUInspector.memory_bandwidth_saxpy_scaling
— Methodmemory_bandwidth_saxpy_scaling() -> sizes, bandwidths
Measures the memory bandwidth (via memory_bandwidth_saxpy
) as a function of vector length. If verbose=true
(default), displays a unicode plot. Returns the considered lengths and GiB/s. For further options, see memory_bandwidth_saxpy
.
GPUInspector.memory_bandwidth_scaling
— Methodmemory_bandwidth_scaling() -> datasizes, bandwidths
Measures the memory bandwidth (via memory_bandwidth
) as a function of data size. If verbose=true
(default), displays a unicode plot. Returns the considered data sizes and GiB/s. For further options, see memory_bandwidth
.
GPUInspector.theoretical_memory_bandwidth
— Methodtheoretical_memory_bandwidth(; device, verbose)
Estimates the theoretical maximal GPU memory bandwidth in GiB/s.
GPUInspector.host2device_bandwidth
— Methodhost2device_bandwidth(; kwargs...)
Performs a host-to-device memory copy benchmark (time measurement) and returns the host-to-device bandwidth estimate (in GiB/s) derived from it.
Keyword arguments:
memsize
(default:GiB(0.5)
): memory size to be usednbench
(default:10
): number of time measurements (i.e. p2p memcopies)verbose
(default:true
): set to false to turn off any printing.stats
(default:false
): whentrue
shows statistical information about the benchmark.times
(default:false
): toggle printing of measured times.dtype
(default:Cchar
): used data type.io
(default:stdout
): set the stream where the results should be printed.
Examples:
host2device_bandwidth()
host2device_bandwidth(MiB(1024))
host2device_bandwidth(KiB(20_000); dtype=Int32)
GPUInspector.p2p_bandwidth
— Methodp2p_bandwidth(; kwargs...) Performs a peer-to-peer memory copy benchmark (time measurement) and returns an inter-gpu memory bandwidth estimate (in GiB/s) derived from it.
Keyword arguments:
memsize
(default:B(40_000_000)
): memory size to be usedsrc
(default:0
): source devicedst
(default:1
): destination devicenbench
(default:5
): number of time measurements (i.e. p2p memcopies)verbose
(default:true
): set to false to turn off any printing.hist
(default:false
): whentrue
, a UnicodePlots-based histogram is printed.times
(default:false
): toggle printing of measured times.alternate
(default:false
): alternatesrc
anddst
, i.e. copy data back and forth.dtype
(default:Float32
): data type to consider.io
(default:stdout
): set the stream where the results should be printed.
Examples:
p2p_bandwidth()
p2p_bandwidth(MiB(1024))
p2p_bandwidth(KiB(20_000); dtype=Int32)
GPUInspector.p2p_bandwidth_all
— Methodp2p_bandwidth_all(; kwargs...)
Run p2p_bandwidth
for all combinations of available devices. Returns a matrix with the p2p memory bandwidth estimates.
GPUInspector.p2p_bandwidth_bidirectional
— MethodSame as p2p_bandwidth
but measures the bidirectional bandwidth (copying data back and forth).
GPUInspector.p2p_bandwidth_bidirectional_all
— MethodSame as p2p_bandwidth_all
but measures the bidirectional bandwidth (copying data back and forth).