GPU Monitoring

Index

References

GPUInspector.livemonitor_powerusageMethod
livemonitor_powerusage(duration) -> times, powerusage

Monitor the power usage of GPU(s) (in Watts) over a given time period, as specified by duration (in seconds). Returns the (relative) times as a Vector{Float64} and the power usage as a Vector{Vector{Float64}}.

For general keyword arguments, see livemonitor_something.

GPUInspector.livemonitor_somethingMethod
livemonitor_something(f, duration) -> times, values

Monitor some property of GPU(s), as specified through the function f, over a given time period, as specified by duration (in seconds). Returns the (relative) times as a Vector{Float64} and the temperatures as a Vector{Vector{Float64}}.

The function f will be called on a vector of devices (CuDevices or NVML.Devices) and should return a vector of Float64 values.

Keyword arguments:

  • freq (default: 1): polling rate in Hz.
  • devices (default: NVML.devices()): CuDevices or NVML.Devices to consider.
  • plot (default: false): Create a unicode plot after the monitoring.
  • liveplot (default: false): Create and update a unicode plot during the monitoring. Use optional ylims to specify fixed y limits.
  • title (default: ""): Title used in unicode plots.
  • ylabel (default: "Values"): y label used in unicode plots.

See: livemonitor_temperature, livemonitor_powerusage

GPUInspector.livemonitor_temperatureMethod
livemonitor_temperature(duration) -> times, temperatures

Monitor the temperature of GPU(s) over a given time period, as specified by duration (in seconds). Returns the (relative) times as a Vector{Float64} and the temperatures as a Vector{Vector{Float64}}.

For general keyword arguments, see livemonitor_something.

GPUInspector.monitoring_startMethod
monitoring_start(; devices=CUDA.devices(), verbose=true)

Start monitoring of GPU temperature, utilization, power usage, etc.

Keyword arguments:

  • freq (default: 1): polling rate in Hz.
  • devices (default: CUDA.devices()): CuDevices or NVML.Devices to monitor.
  • thread (default: Threads.nthreads()): id of the Julia thread that should run the monitoring.
  • verbose (default: true): toggle verbose output.

See also monitoring_stop.

GPUInspector.monitoring_stopMethod
monitoring_stop(; verbose=true) -> results

Stops the GPU monitoring and returns the measured values. Specifically, results is a named tuple with the following keys:

  • time: the (relative) times at which we measured
  • temperature, power, compute, mem

See also monitoring_start and plot_monitoring_results.

GPUInspector.plot_monitoring_resultsFunction
plot_monitoring_results(r::MonitoringResults, symbols=keys(r.results))

Plot the quantities specified through symbols of a MonitoringResults object. Will generate a textual in-terminal / in-logfile plot using UnicodePlots.jl.

GPUInspector.savefig_monitoring_resultsFunction
savefig_monitoring_results(r::MonitoringResults, symbols=keys(r.results))

Save plots of the quantities specified through symbols of a MonitoringResults object to disk. Note: Only available if CairoMakie.jl is loaded next to GPUInspector.jl.