Data Bandwidth

Index

References

GPUInspector.memory_bandwidthMethod
memory_bandwidth(; kwargs...)

Tries to estimate the peak memory bandwidth of a GPU in GiB/s by measuring the time it takes to perform a memcpy of a certain amount of data (as specified by memsize).

Keyword arguments:

  • memsize (default: GiB(0.5)): memory size to be used
  • device (default: e.g. CUDA.device()): GPU device to be used.
  • dtype (default: Cchar): element type of the vectors.
  • verbose (default: true): toggle printing.
  • io (default: stdout): set the stream where the results should be printed.

See also: memory_bandwidth_scaling.

source
GPUInspector.memory_bandwidth_saxpyMethod

Tries to estimate the peak memory bandwidth of a GPU in GiB/s by measuring the time it takes to perform a SAXPY, i.e. a * x[i] + y[i].

Keyword arguments:

  • device (default: e.g. CUDA.device()): GPU device to be used.
  • dtype (default: Float32): element type of the vectors.
  • size (default: 2^20 * 10): length of the vectors.
  • nbench (default: 5): number of measurements to be performed the best of which is used for the GiB/s computation.
  • verbose (default: true): toggle printing.
  • io (default: stdout): set the stream where the results should be printed.

See also: memory_bandwidth_saxpy_scaling.

source
GPUInspector.host2device_bandwidthMethod
host2device_bandwidth(; kwargs...)

Performs a host-to-device memory copy benchmark (time measurement) and returns the host-to-device bandwidth estimate (in GiB/s) derived from it.

Keyword arguments:

  • memsize (default: GiB(0.5)): memory size to be used
  • nbench (default: 10): number of time measurements (i.e. p2p memcopies)
  • verbose (default: true): set to false to turn off any printing.
  • stats (default: false): when true shows statistical information about the benchmark.
  • times (default: false): toggle printing of measured times.
  • dtype (default: Cchar): used data type.
  • io (default: stdout): set the stream where the results should be printed.

Examples:

host2device_bandwidth()
host2device_bandwidth(MiB(1024))
host2device_bandwidth(KiB(20_000); dtype=Int32)
source
GPUInspector.p2p_bandwidthMethod

p2p_bandwidth(; kwargs...) Performs a peer-to-peer memory copy benchmark (time measurement) and returns an inter-gpu memory bandwidth estimate (in GiB/s) derived from it.

Keyword arguments:

  • memsize (default: B(40_000_000)): memory size to be used
  • src (default: 0): source device
  • dst (default: 1): destination device
  • nbench (default: 5): number of time measurements (i.e. p2p memcopies)
  • verbose (default: true): set to false to turn off any printing.
  • hist (default: false): when true, a UnicodePlots-based histogram is printed.
  • times (default: false): toggle printing of measured times.
  • alternate (default: false): alternate src and dst, i.e. copy data back and forth.
  • dtype (default: Float32): data type to consider.
  • io (default: stdout): set the stream where the results should be printed.

Examples:

p2p_bandwidth()
p2p_bandwidth(MiB(1024))
p2p_bandwidth(KiB(20_000); dtype=Int32)
source