Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

OAR plugin

The oar plugin gathers measurements about OAR jobs.

Requirements

  • A node with OAR installed and running. Both OAR 2 and OAR 3 are supported (config required).
  • Both cgroups v1 and cgroups v2 are supported. Some metrics may not be available with cgroups v1.

Metrics

Here are the metrics collected by the plugin's sources.

NameTypeUnitDescriptionResourceResourceConsumerAttributes
cpu_time_deltaDeltananosecondstime spent by the pod executing on the CPULocalMachineCgroupsee below
cpu_percentGaugePercent (0 to 100)cpu_time_delta / delta_t (1 core used fully = 100%)LocalMachineCgroupsee below
memory_usageGaugeBytestotal pod's memory usageLocalMachineCgroupsee below
cgroup_memory_anonymousGaugeBytesanonymous memory usageLocalMachineCgroupsee below
cgroup_memory_fileGaugeBytesmemory used to cache filesystem dataLocalMachineCgroupsee below
cgroup_memory_kernel_stackGaugeBytesmemory allocated to kernel stacksLocalMachineCgroupsee below
cgroup_memory_pagetablesGaugeBytesmemory reserved for the page tablesLocalMachineCgroupsee below

Attributes

The measurements produced by the slurm plugin have the following attributes:

  • job_id: id of the OAR job.
  • user_id: id of the user that submitted the job.

The cpu measurements have an additional attribute kind, which can be one of:

  • total: time spent in kernel and user mode
  • system: time spent in kernel mode only
  • user: time spent in user mode only

Augmentation of the measurements of other plugins

The oar plugin adds attributes to the measurements of the other plugins. If a measurement does not have a job_id attribute, it gets a new involved_jobs attribute, which contains a list of the ids of the jobs that are running on the node (at the time of the transformation).

This allows to know, for each measurement, which job was running at that time. For the reasoning behind this feature, see issue #209.

Configuration

Here is an example of how to configure this plugin. Put the following in the configuration file of the Alumet agent (usually alumet-config.toml).

[plugins.oar]
# The version of OAR, either "oar2" or "oar3".
oar_version = "oar3"
# Interval between each measurement.
poll_interval = "1s"
# If true, only monitors jobs and ignore other cgroups.
jobs_only = true