GPU Monitoring
Index
GPUInspector.ismonitoringGPUInspector.livemonitor_powerusageGPUInspector.livemonitor_somethingGPUInspector.livemonitor_temperatureGPUInspector.monitoring_startGPUInspector.monitoring_stopGPUInspector.plot_monitoring_resultsGPUInspector.savefig_monitoring_results
References
GPUInspector.ismonitoring — MethodChecks if we are currently monitoring.
GPUInspector.livemonitor_powerusage — Methodlivemonitor_powerusage(duration) -> times, powerusageMonitor the power usage of GPU(s) (in Watts) over a given time period, as specified by duration (in seconds). Returns the (relative) times as a Vector{Float64} and the power usage as a Vector{Vector{Float64}}.
For general keyword arguments, see livemonitor_something.
GPUInspector.livemonitor_something — Methodlivemonitor_something(f, duration) -> times, valuesMonitor some property of GPU(s), as specified through the function f, over a given time period, as specified by duration (in seconds). Returns the (relative) times as a Vector{Float64} and the temperatures as a Vector{Vector{Float64}}.
The function f will be called on a vector of devices (CuDevices or NVML.Devices) and should return a vector of Float64 values.
Keyword arguments:
freq(default:1): polling rate in Hz.devices(default:NVML.devices()):CuDevices orNVML.Devices to consider.plot(default:false): Create a unicode plot after the monitoring.liveplot(default:false): Create and update a unicode plot during the monitoring. Use optionalylimsto specify fixed y limits.title(default:""): Title used in unicode plots.ylabel(default:"Values"): y label used in unicode plots.
GPUInspector.livemonitor_temperature — Methodlivemonitor_temperature(duration) -> times, temperaturesMonitor the temperature of GPU(s) over a given time period, as specified by duration (in seconds). Returns the (relative) times as a Vector{Float64} and the temperatures as a Vector{Vector{Float64}}.
For general keyword arguments, see livemonitor_something.
GPUInspector.monitoring_start — Methodmonitoring_start(; devices=CUDA.devices(), verbose=true)Start monitoring of GPU temperature, utilization, power usage, etc.
Keyword arguments:
freq(default:1): polling rate in Hz.devices(default:CUDA.devices()):CuDevices orNVML.Devices to monitor.thread(default:Threads.nthreads()): id of the Julia thread that should run the monitoring.verbose(default:true): toggle verbose output.
See also monitoring_stop.
GPUInspector.monitoring_stop — Methodmonitoring_stop(; verbose=true) -> resultsStops the GPU monitoring and returns the measured values. Specifically, results is a named tuple with the following keys:
time: the (relative) times at which we measuredtemperature,power,compute,mem
See also monitoring_start and plot_monitoring_results.
GPUInspector.plot_monitoring_results — Functionplot_monitoring_results(r::MonitoringResults, symbols=keys(r.results))Plot the quantities specified through symbols of a MonitoringResults object. Will generate a textual in-terminal / in-logfile plot using UnicodePlots.jl.
GPUInspector.savefig_monitoring_results — Functionsavefig_monitoring_results(r::MonitoringResults, symbols=keys(r.results))Save plots of the quantities specified through symbols of a MonitoringResults object to disk. Note: Only available if CairoMakie.jl is loaded next to GPUInspector.jl.