Peakflops
Index
GPUInspector.kernel_fmaGPUInspector.peakflops_gpuGPUInspector.peakflops_gpu_fmasGPUInspector.peakflops_gpu_matmulGPUInspector.peakflops_gpu_matmul_graphsGPUInspector.peakflops_gpu_matmul_scalingGPUInspector.peakflops_gpu_wmmasGPUInspector.theoretical_peakflops_gpu
References
GPUInspector.peakflops_gpu — Methodpeakflops_gpu(; tensorcores=hastensorcores(), kwargs...)Tries to estimate the peak performance of a GPU in TFLOP/s by measuring the time it takes to perform
_kernel_fma_nfmas() * sizemany FMAs on CUDA cores (iftensorcores == false)_kernel_wmma_nwmmas()many WMMAs on Tensor Cores (iftensorcores == true)
For more keyword argument options see peakflops_gpu_fmas and peakflops_gpu_wmmas.
GPUInspector.theoretical_peakflops_gpu — MethodEstimates the theoretical peak performance of a CUDA device in TFLOP/s.
Keyword arguments:
tensorcores(default:hastensorcores()): toggle usage of tensore cores. Iffalse, cuda cores will be used.verbose(default:true): toggle printing of informationdevice(default:device()): CUDA device to be analyzeddtype(default:tensorcores ? Float16 : Float32): element type of the matrices
GPUInspector.peakflops_gpu_matmul — Methodpeakflops_gpu_matmul(; device, dtype=Float32, size=2^14, nmatmuls=5, nbench=5, verbose=true)Tries to estimate the peak performance of a GPU in TFLOP/s by measuring the time it takes to perform nmatmuls many (in-place) matrix-matrix multiplications.
Keyword arguments:
device(default:CUDA.device()): CUDA device to be used.dtype(default:Float32): element type of the matrices.size(default:2^14): matrices will have dimensions(size, size).nmatmuls(default:5): number of matmuls that will make up the kernel to be timed.nbench(default:5): number of measurements to be performed the best of which is used for the TFLOP/s computation.verbose(default:true): toggle printing.
See also: peakflops_gpu_matmul_scaling, peakflops_gpu_matmul_graphs.
GPUInspector.peakflops_gpu_matmul_graphs — MethodSame as peakflops_gpu_matmul but uses CUDA's graph API to define and launch the kernel.
See also: peakflops_gpu_matmul_scaling.
GPUInspector.peakflops_gpu_matmul_scaling — Methodpeakflops_gpu_matmul_scaling(peakflops_func = peakflops_gpu_matmul; verbose=true) -> sizes, flopsAsserts the scaling of the given peakflops_function (defaults to peakflops_gpu_matmul) with increasing matrix size. If verbose=true (default), displays a unicode plot. Returns the considered sizes and TFLOP/s. For further options, see peakflops_gpu_matmul.
GPUInspector.kernel_fma — MethodDummy kernel doing _kernel_fma_nfmas() many FMAs (default: 100_000).
GPUInspector.peakflops_gpu_fmas — Methodpeakflops_gpu_fmas(; size::Integer=5_000_000, dtype=Float32, nbench=5, nkernel=5, device=CUDA.device(), verbose=true)Tries to estimate the peak performance of a GPU in TFLOP/s by measuring the time it takes to perform _kernel_fma_nfmas() * size many FMAs on CUDA cores.
Keyword arguments:
device(default:CUDA.device()): CUDA device to be used.dtype(default:Float32): element type of the matrices.size(default:5_000_000): length of vectors.nkernel(default:5): number of kernel calls that make up one benchmarking sample.nbench(default:5): number of measurements to be performed the best of which is used for the TFLOP/s computation.verbose(default:true): toggle printing.
GPUInspector.peakflops_gpu_wmmas — Methodpeakflops_gpu_wmmas()Tries to estimate the peak performance of a GPU in TFLOP/s by measuring the time it takes to perform _kernel_wmma_nwmmas() many WMMAs on Tensor Cores.
Keyword arguments:
device(default:CUDA.device()): CUDA device to be used.dtype(default:Float16): element type of the matrices. We currently only supportFloat16(Int8,:TensorFloat32,:BFloat16, andFloat64might or might not work).nkernel(default:10): number of kernel calls that make up one benchmarking sample.nbench(default:5): number of measurements to be performed the best of which is used for the TFLOP/s computation.threads(default: max. threads per block): how many threads to use per block (part of the kernel launch configuration).blocks(default:2048): how many blocks to use (part of the kernel launch configuration).verbose(default:true): toggle printing.