GPU Monitoring
Index
GPUInspector.ismonitoring
GPUInspector.livemonitor_powerusage
GPUInspector.livemonitor_something
GPUInspector.livemonitor_temperature
GPUInspector.monitoring_start
GPUInspector.monitoring_stop
GPUInspector.plot_monitoring_results
GPUInspector.savefig_monitoring_results
References
GPUInspector.ismonitoring
— MethodChecks if we are currently monitoring.
GPUInspector.livemonitor_powerusage
— Methodlivemonitor_powerusage(duration) -> times, powerusage
Monitor the power usage of GPU(s) (in Watts) over a given time period, as specified by duration
(in seconds). Returns the (relative) times as a Vector{Float64}
and the power usage as a Vector{Vector{Float64}}
.
For general keyword arguments, see livemonitor_something
.
GPUInspector.livemonitor_something
— Methodlivemonitor_something(f, duration) -> times, values
Monitor some property of GPU(s), as specified through the function f
, over a given time period, as specified by duration
(in seconds). Returns the (relative) times as a Vector{Float64}
and the temperatures as a Vector{Vector{Float64}}
.
The function f
will be called on a vector of devices (CuDevice
s or NVML.Device
s) and should return a vector of Float64
values.
Keyword arguments:
freq
(default:1
): polling rate in Hz.devices
(default:NVML.devices()
):CuDevice
s orNVML.Device
s to consider.plot
(default:false
): Create a unicode plot after the monitoring.liveplot
(default:false
): Create and update a unicode plot during the monitoring. Use optionalylims
to specify fixed y limits.title
(default:""
): Title used in unicode plots.ylabel
(default:"Values"
): y label used in unicode plots.
GPUInspector.livemonitor_temperature
— Methodlivemonitor_temperature(duration) -> times, temperatures
Monitor the temperature of GPU(s) over a given time period, as specified by duration
(in seconds). Returns the (relative) times as a Vector{Float64}
and the temperatures as a Vector{Vector{Float64}}
.
For general keyword arguments, see livemonitor_something
.
GPUInspector.monitoring_start
— Methodmonitoring_start(; devices=CUDA.devices(), verbose=true)
Start monitoring of GPU temperature, utilization, power usage, etc.
Keyword arguments:
freq
(default:1
): polling rate in Hz.devices
(default:CUDA.devices()
):CuDevice
s orNVML.Device
s to monitor.thread
(default:Threads.nthreads()
): id of the Julia thread that should run the monitoring.verbose
(default:true
): toggle verbose output.
See also monitoring_stop
.
GPUInspector.monitoring_stop
— Methodmonitoring_stop(; verbose=true) -> results
Stops the GPU monitoring and returns the measured values. Specifically, results
is a named tuple with the following keys:
time
: the (relative) times at which we measuredtemperature
,power
,compute
,mem
See also monitoring_start
and plot_monitoring_results
.
GPUInspector.plot_monitoring_results
— Functionplot_monitoring_results(r::MonitoringResults, symbols=keys(r.results))
Plot the quantities specified through symbols
of a MonitoringResults
object. Will generate a textual in-terminal / in-logfile plot using UnicodePlots.jl.
GPUInspector.savefig_monitoring_results
— Functionsavefig_monitoring_results(r::MonitoringResults, symbols=keys(r.results))
Save plots of the quantities specified through symbols
of a MonitoringResults
object to disk. Note: Only available if CairoMakie.jl is loaded next to GPUInspector.jl.