GPU Monitoring
Index
GPUInspector.ismonitoringGPUInspector.livemonitor_powerusageGPUInspector.livemonitor_somethingGPUInspector.livemonitor_temperatureGPUInspector.load_monitoring_resultsGPUInspector.monitoring_startGPUInspector.monitoring_stopGPUInspector.plot_monitoring_resultsGPUInspector.save_monitoring_resultsGPUInspector.savefig_monitoring_resultsGPUInspector.MonitoringResults
References
GPUInspector.livemonitor_powerusage — Methodlivemonitor_powerusage(duration) -> times, powerusageMonitor the power usage of GPU(s) (in Watts) over a given time period, as specified by duration (in seconds). Returns the (relative) times as a Vector{Float64} and the power usage as a Vector{Vector{Float64}}.
For general keyword arguments, see livemonitor_something.
GPUInspector.livemonitor_something — Methodlivemonitor_something(f, duration) -> times, valuesMonitor some property of GPU(s), as specified through the function f, over a given time period, as specified by duration (in seconds). Returns the (relative) times as a Vector{Float64} and the temperatures as a Vector{Vector{Float64}}.
The function f will be called on a vector of devices and should return a vector of Float64 values.
Keyword arguments:
freq(default:1): polling rate in Hz.devices(default: e.g.NVML.devices()): Devices to monitor.plot(default:false): Create a unicode plot after the monitoring.liveplot(default:false): Create and update a unicode plot during the monitoring. Use optionalylimsto specify fixed y limits.title(default:""): Title used in unicode plots.ylabel(default:"Values"): y label used in unicode plots.
GPUInspector.livemonitor_temperature — Methodlivemonitor_temperature(duration) -> times, temperaturesMonitor the temperature of GPU(s) over a given time period, as specified by duration (in seconds). Returns the (relative) times as a Vector{Float64} and the temperatures as a Vector{Vector{Float64}}.
For general keyword arguments, see livemonitor_something.
GPUInspector.monitoring_start — Methodmonitoring_start(; devices, kwargs...)Start monitoring of GPU temperature, utilization, power usage, etc.
Keyword arguments:
freq(default:1): polling rate in Hz.devices(default: e.g.CUDA.devices()): GPU devices to monitor.thread(default:Threads.nthreads()): id of the Julia thread that should run the monitoring.verbose(default:true): toggle verbose output.
See also monitoring_stop.
GPUInspector.monitoring_stop — Methodmonitoring_stop(; verbose=true) -> resultsStops the GPU monitoring and returns the measured values.
See also monitoring_start and plot_monitoring_results.
GPUInspector.savefig_monitoring_results — Functionsavefig_monitoring_results(r::MonitoringResults, symbols=keys(r.results); ext=:pdf)Save plots of the quantities specified through symbols of a MonitoringResults object to disk. Note: Only available if CairoMakie.jl is loaded next to GPUInspector.jl.
GPUInspector.MonitoringResults — TypeStruct to hold the results of monitoring. This includes the time points (times), the monitored devices (devices), as well as a dictionary holding the (vector-)values of different quantities (identified by symbols) at each of the time points.
GPUInspector.ismonitoring — MethodChecks if we are currently monitoring.
GPUInspector.plot_monitoring_results — Functionplot_monitoring_results(r::MonitoringResults, symbols=keys(r.results))Plot the quantities specified through symbols of a MonitoringResults object. Will generate a textual in-terminal / in-logfile plot using UnicodePlots.jl.
GPUInspector.load_monitoring_results — MethodGiven an HDF5 file created with save_monitoring_results, restore the saved monitoring results (i.e. output of monitoring_stop).
GPUInspector.save_monitoring_results — Methodsave_monitoring_results(filename::String, r::MonitoringResults; overwrite=false)Store the given MonitoringResults (output of monitoring_stop) to disk as an HDF5 file with name filename.