GPU Stresstest

Index

GPUInspector.stresstest
GPUInspector.StressTestBatched
GPUInspector.StressTestEnforced
GPUInspector.StressTestFixedIter
GPUInspector.StressTestStoreResults

References

GPUInspector.stresstest — Method

stresstest(device_or_devices)

Run a GPU stress test (matrix multiplication) on one or multiple GPU devices, as specified by the positional argument. If no argument is provided (only) the currently active GPU will be used.

Keyword arguments:

Choose one of the following (or none):

duration: stress test will take about the given time in seconds. (StressTestBatched)
enforced_duration: stress test will take almost precisely the given time in seconds. (StressTestEnforced)
approx_duration: stress test will hopefully take approximately the given time in seconds. No promises made! (StressTestFixedIter)
niter: stress test will run the given number of matrix-multiplications, however long that will take. (StressTestFixedIter)
mem: number (<:Real) between 0 and 1, indicating the fraction of the available GPU memory that should be used, or a <:UnitPrefixedBytes indicating an absolute memory limit. (StressTestStoreResults)

General settings:

dtype (default: Float32): element type of the matrices
monitoring (default: false): enable automatic monitoring, in which case a MonitoringResults object is returned.
size (default: 2048): matrices of size (size, size) will be used
verbose (default: true): toggle printing of information
parallel (default: true): If true, will (try to) run each GPU test on a different Julia thread. Make sure to have enough Julia threads.
threads (default: nothing): If parallel == true, this argument may be used to specify the Julia threads to use.
clearmem (default: false): If true, we call clear_all_gpus_memory after the stress test.

When duration is specifiec (i.e. StressTestEnforced) there is also:

batch_duration (default: ceil(Int, duration/10)): desired duration of one batch of matmuls.

GPUInspector.StressTestBatched — Type

GPU stress test (matrix multiplications) in which we try to run for a given time period. We try to keep the CUDA stream continously busy with matmuls at any point in time. Concretely, we submit batches of matmuls and, after half of them, we record a CUDA event. On the host, after submitting a batch, we (non-blockingly) synchronize on, i.e. wait for, the CUDA event and, if we haven't exceeded the desired duration already, submit another batch.

GPUInspector.StressTestEnforced — Type

GPU stress test (matrix multiplications) in which we run almost precisely for a given time period (duration is enforced).

GPUInspector.StressTestFixedIter — Type

GPU stress test (matrix multiplications) in which we run for a given number of iteration, or try to run for a given time period (with potentially high uncertainty!). In the latter case, we estimate how long a synced matmul takes and set niter accordingly.

GPUInspector.StressTestStoreResults — Type

GPU stress test (matrix multiplications) in which we store all matmul results and try to run as many iterations as possible for a certain memory limit (default: 90% of free memory).

This stress test is somewhat inspired by gpu-burn by Ville Timonen.