GPU Stresstest
Index
GPUInspector.stresstest
GPUInspector.StressTestBatched
GPUInspector.StressTestEnforced
GPUInspector.StressTestFixedIter
GPUInspector.StressTestStoreResults
References
GPUInspector.stresstest
— Methodstresstest(device_or_devices)
Run a GPU stress test (matrix multiplication) on one or multiple GPU devices, as specified by the positional argument. If no argument is provided (only) the currently active GPU will be used.
Keyword arguments:
Choose one of the following (or none):
duration
: stress test will take about the given time in seconds. (StressTestBatched)enforced_duration
: stress test will take almost precisely the given time in seconds. (StressTestEnforced)approx_duration
: stress test will hopefully take approximately the given time in seconds. No promises made! (StressTestFixedIter)niter
: stress test will run the given number of matrix-multiplications, however long that will take. (StressTestFixedIter)mem
: number (<:Real
) between 0 and 1, indicating the fraction of the available GPU memory that should be used, or a<:UnitPrefixedBytes
indicating an absolute memory limit. (StressTestStoreResults)
General settings:
dtype
(default:Float32
): element type of the matricesmonitoring
(default:false
): enable automatic monitoring, in which case aMonitoringResults
object is returned.size
(default:2048
): matrices of size(size, size)
will be usedverbose
(default:true
): toggle printing of informationparallel
(default:true
): Iftrue
, will (try to) run each GPU test on a different Julia thread. Make sure to have enough Julia threads.threads
(default:nothing
): Ifparallel == true
, this argument may be used to specify the Julia threads to use.clearmem
(default:false
): Iftrue
, we callclear_all_gpus_memory
after the stress test.
When duration
is specifiec (i.e. StressTestEnforced
) there is also:
batch_duration
(default:ceil(Int, duration/10)
): desired duration of one batch of matmuls.
GPUInspector.StressTestBatched
— TypeGPU stress test (matrix multiplications) in which we try to run for a given time period. We try to keep the CUDA stream continously busy with matmuls at any point in time. Concretely, we submit batches of matmuls and, after half of them, we record a CUDA event. On the host, after submitting a batch, we (non-blockingly) synchronize on, i.e. wait for, the CUDA event and, if we haven't exceeded the desired duration already, submit another batch.
GPUInspector.StressTestEnforced
— TypeGPU stress test (matrix multiplications) in which we run almost precisely for a given time period (duration is enforced).
GPUInspector.StressTestFixedIter
— TypeGPU stress test (matrix multiplications) in which we run for a given number of iteration, or try to run for a given time period (with potentially high uncertainty!). In the latter case, we estimate how long a synced matmul takes and set niter
accordingly.
GPUInspector.StressTestStoreResults
— TypeGPU stress test (matrix multiplications) in which we store all matmul results and try to run as many iterations as possible for a certain memory limit (default: 90% of free memory).
This stress test is somewhat inspired by gpu-burn by Ville Timonen.