GPU Monitoring

Index

References

GPUInspector.livemonitor_powerusageMethod
livemonitor_powerusage(duration) -> times, powerusage

Monitor the power usage of GPU(s) (in Watts) over a given time period, as specified by duration (in seconds). Returns the (relative) times as a Vector{Float64} and the power usage as a Vector{Vector{Float64}}.

For general keyword arguments, see livemonitor_something.

source
GPUInspector.livemonitor_somethingMethod
livemonitor_something(f, duration) -> times, values

Monitor some property of GPU(s), as specified through the function f, over a given time period, as specified by duration (in seconds). Returns the (relative) times as a Vector{Float64} and the temperatures as a Vector{Vector{Float64}}.

The function f will be called on a vector of devices and should return a vector of Float64 values.

Keyword arguments:

  • freq (default: 1): polling rate in Hz.
  • devices (default: e.g. NVML.devices()): Devices to monitor.
  • plot (default: false): Create a unicode plot after the monitoring.
  • liveplot (default: false): Create and update a unicode plot during the monitoring. Use optional ylims to specify fixed y limits.
  • title (default: ""): Title used in unicode plots.
  • ylabel (default: "Values"): y label used in unicode plots.

See: livemonitor_temperature, livemonitor_powerusage

source
GPUInspector.livemonitor_temperatureMethod
livemonitor_temperature(duration) -> times, temperatures

Monitor the temperature of GPU(s) over a given time period, as specified by duration (in seconds). Returns the (relative) times as a Vector{Float64} and the temperatures as a Vector{Vector{Float64}}.

For general keyword arguments, see livemonitor_something.

source
GPUInspector.monitoring_startMethod
monitoring_start(; devices, kwargs...)

Start monitoring of GPU temperature, utilization, power usage, etc.

Keyword arguments:

  • freq (default: 1): polling rate in Hz.
  • devices (default: e.g. CUDA.devices()): GPU devices to monitor.
  • thread (default: Threads.nthreads()): id of the Julia thread that should run the monitoring.
  • verbose (default: true): toggle verbose output.

See also monitoring_stop.

source
GPUInspector.savefig_monitoring_resultsFunction
savefig_monitoring_results(r::MonitoringResults, symbols=keys(r.results); ext=:pdf)

Save plots of the quantities specified through symbols of a MonitoringResults object to disk. Note: Only available if CairoMakie.jl is loaded next to GPUInspector.jl.

source
GPUInspector.MonitoringResultsType

Struct to hold the results of monitoring. This includes the time points (times), the monitored devices (devices), as well as a dictionary holding the (vector-)values of different quantities (identified by symbols) at each of the time points.

source
GPUInspector.plot_monitoring_resultsFunction
plot_monitoring_results(r::MonitoringResults, symbols=keys(r.results))

Plot the quantities specified through symbols of a MonitoringResults object. Will generate a textual in-terminal / in-logfile plot using UnicodePlots.jl.

source