Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

NVIDIA NVML plugin

The nvml plugin allows to monitor NVIDIA GPUs.

Requirements

  • Linux
  • NVIDIA GPU(s)
  • NVIDIA drivers installed. You probably want to use the packages provided by your Linux distribution.

Metrics

Here are the metrics collected by the plugin's source(s). One source will be created per GPU device.

NameTypeUnitDescriptionResourceResourceConsumerAttributes
nvml_energy_consumptionCounter DiffmilliJouleAverage between 2 measurement points based on the consumed energy since the last bootGPULocalMachine
nvml_instant_powerGaugemilliWattInstant power consumptionGPULocalMachine
nvml_temperature_gpuGaugeCelsiusMain temperature emitted by a given deviceGPULocalMachine
nvml_gpu_utilizationGaugePercentage (0-100)GPU rate utilizationGPULocalMachine
nvml_encoder_sampling_periodGaugeMicrosecondCurrent utilization and sampling size for the encoderGPULocalMachine
nvml_decoder_sampling_periodGaugeMicrosecondCurrent utilization and sampling size for the decoderGPULocalMachine
nvml_n_compute_processesGaugeNoneRelevant currently running computing processes dataGPULocalMachine
nvml_n_graphic_processesGaugeNoneRelevant currently running graphical processes dataGPULocalMachine
nvml_memory_utilizationGaugePercentageGPU memory utilization by a processProcessLocalMachine
nvml_encoder_utilizationGaugePercentageGPU video encoder utilization by a processProcessLocalMachine
nvml_decoder_utilizationGaugePercentageGPU video decoder utilization by a processProcessLocalMachine
nvml_sm_utilizationGaugePercentageUtilization of the GPU streaming multiprocessors by a process (3D task and rendering, etc...)ProcessLocalMachine

Configuration

Here is an example of how to configure this plugin. Put the following in the configuration file of the Alumet agent (usually alumet-config.toml).

[plugins.nvml]
# Initial interval between two Nvidia measurements.
poll_interval = "1s"

# Initial interval between two flushing of Nvidia measurements.
flush_interval = "5s"

# On startup, the plugin inspects the GPU devices and detect their features.
# If `skip_failed_devices = true` (or is omitted), inspection failures will be logged and the plugin will continue.
# If `skip_failed_devices = true`, the first failure will make the plugin's startup fail.
skip_failed_devices = true

More information

Not all software use the GPU to its full extent. For instance, to obtain non-zero values for the video encoding/decoding metrics, use a video software like ffmpeg.