Data Bandwidth



memory_bandwidth([memsize; kwargs...])

Tries to estimate the peak memory bandwidth of a GPU in GiB/s by measuring the time it takes to perform a memcpy of a certain amount of data (as specified by memsize).

Keyword arguments:

  • device (default: CUDA.device()): CUDA device to be used.
  • dtype (default: Cchar): element type of the vectors.
  • verbose (default: true): toggle printing.

See also: memory_bandwidth_scaling.


Tries to estimate the peak memory bandwidth of a GPU in GiB/s by measuring the time it takes to perform a SAXPY, i.e. a * x[i] + y[i].

Keyword arguments:

  • device (default: CUDA.device()): CUDA device to be used.
  • dtype (default: Float32): element type of the vectors.
  • size (default: 2^20 * 10): length of the vectors.
  • nbench (default: 5): number of measurements to be performed the best of which is used for the GiB/s computation.
  • verbose (default: true): toggle printing.
  • cublas (default: true): toggle between CUDA.axpy! and a custom saxpy_gpu_kernel!.

See also: memory_bandwidth_saxpy_scaling.

host2device_bandwidth([memsize::UnitPrefixedBytes=GiB(0.5)]; kwargs...)

Performs a host-to-device memory copy benchmark (time measurement) and returns the host-to-device bandwidth estimate (in GiB/s) derived from it.

Keyword arguments:

  • nbench (default: 10): number of time measurements (i.e. p2p memcopies)
  • verbose (default: true): set to false to turn off any printing.
  • stats (default: false): when true shows statistical information about the benchmark.
  • times (default: false): toggle printing of measured times.
  • dtype (default: Cchar): used data type.


host2device_bandwidth(KiB(20_000); dtype=Int32)
p2p_bandwidth([memsize::UnitPrefixedBytes]; kwargs...)

Performs a peer-to-peer memory copy benchmark (time measurement) and returns an inter-gpu memory bandwidth estimate (in GiB/s) derived from it.

Keyword arguments:

  • src (default: 0): source device
  • dst (default: 1): destination device
  • nbench (default: 5): number of time measurements (i.e. p2p memcopies)
  • verbose (default: true): set to false to turn off any printing.
  • hist (default: false): when true, a UnicodePlots-based histogram is printed.
  • times (default: false): toggle printing of measured times.
  • alternate (default: false): alternate src and dst, i.e. copy data back and forth.
  • dtype (default: Float32): see alloc_mem.


p2p_bandwidth(KiB(20_000); dtype=Int32)