Introduction

Welcome to the ALUMET user guide! If you want to measure something with ALUMET, you have come to the right place.

To skip the introduction and install ALUMET, click here.

What is ALUMET?

ALUMET is a modular and efficient software measurement tool. With ALUMET, you can:

  • measure the energy consumption of your CPU, GPU, and more
  • assign the energy consumption of hardware resources to their consumers (such as processes, K8S pods, containers, etc.)
  • gather performance metrics at a configurable frequency
  • monitor laptops, desktops and HPC servers
  • profile your applications

ALUMET (sometimes written "Alumet") is acronym for Adaptive, Lightweight, Unified METrics.

High-level architecture

Diagram of Alumet (high-level view)

The measurement sources (in yellow), data transforms (in blue) and outputs (in green) are provided by plugins, not by Alumet's core. Plugins are developed as separate software libraries. This is an improvement over monolithic tools, because it allows developers to easily extend the capabilities of the tool, in every part of the measurement pipeline (sources -> transforms -> outputs).

We offer many "standard" plugins, but you are free to create your own, for instance if you want to gather metrics from a piece of hardware that we do not support. Please read the developer guide to learn more about the creation of plugins.

Performance

The L in Alumet stands for Lightweight. Why is Alumet "lightweight" compared to other measurement tools?

  1. Optimized pipeline: Alumet is written in Rust, optimized for minimal latency and low memory consumption.
  2. Efficient interfaces: When we develop a new measurement source, we try to find the most efficient way of measuring what we're interested in. As a result, many plugins are based on low-level interfaces, such as the Linux perf_events interface, instead of slower higher-level wrappers. In particular, we try to remove useless intermediate levels, such as calling an external program and parsing its text output.
  3. Pay only for what you need: Alumet's modularity allows you to create a bespoke measurement tool by choosing the plugins that suit your needs, and removing the rest. You don't need a mathematical model that assigns the energy consumption of hardware components to processes? Remove it, and enjoy an even smaller disk footprint, CPU overhead, memory use and energy consumption.

Read more about the advantages of Alumet on the next page: Why ALUMET and not <X>?.

Does it work on <my_machine>?

For now, Alumet works in the following environments:

  • Operating Systems: Linux, macOS1, Windows1
  • Hardware components2:
    • CPUs: Intel x86 processors (Sandy Bridge or more recent), AMD x86 processors (Zen 1 or more recent), NVIDIA Jetson CPUs (any model)
    • GPUs: NVIDIA dedicated GPUs, NVIDIA Jetson GPUs (any model)
1

While the core of Alumet is cross-platform, many plugins only work on Linux, for example the RAPL and perf plugins. There is no macOS-specific nor Windows-specific plugin for the moment, so Alumet will not be able to measure interesting metrics on these systems.

2

If your computer contains both supported and unsupported components, you can still use Alumet (with the plugins corresponding to the supported components). It will simply not measure the unsupported components.

Why ALUMET and not <X>?

Every tool comes with its limitations, and Alumet is not the only measurement software out there. Here is why we think that Alumet may be better than other existing tools.

Generic and Unified

You can plug many different sources, outputs, and even transformation functions into Alumet, without modifying its core. Instead of having one specialized tool for the CPU, another one for the GPU, and another one for Kubernetes pods, each with a different interface, polling frequency and methodology, you can have one instance Alumet.

With Alumet, the tedious work that was previously duplicated is now factorized in a common tool. Tedious work for the administrator: configuring the gathering of the metrics, giving the proper rights to each tool, saving the results to a database, etc. But also tedious work for the developer: supporting a new database, writing the configuration management code, optimizing each tool, etc.

Extensible

Alumet is made of a core, on top of which we add plugins. You choose the plugins that suit your needs and build a measurement application with them. You can also create new plugins with an easy-to-use, high-level API.

Some existing tools claim that they have a plugin interface and a modular code. However, their modularity is often limited to a few functions or abstract interfaces in a largely monolithic codebase. For example, supporting a new source of measurements often require to modify the core of the tool. In contrast, Alumet offers modularity in a way that is both easy to use and powerful. The first evidence of this advantage is that the core and the plugins are in distinct crates, rather than in a monolithic codebase. A second evidence is that Alumet plugins have less restrictions than Telegraf plugins. For instance, the same Alumet plugin can provide sources, transforms and outputs. A plugin can also modify the pipeline's configuration at runtime, without prior knowledge of the other plugins.

Lightweight and fast

Alumet is written in Rust and optimized for minimal latency and low memory consumption. Furthermore, many plugins are based on low-level sensors like the perf_events interface of the Linux kernel. Finally, the plugin system allows you to only include the plugins that suit your needs, instead of installing a do-it-all monolith.

Our preliminary results seem to show that Alumet uses less CPU cycles, consumes less memory and is overall more efficient than the existing tools. We will upload benchmark results in the future.

Adaptive

We worked hard to provide two forms of adaptation:

  1. Adapting to your needs at compile-time: thanks to Alumet's modularity, you are able to build a measurement software tailored to your needs.
  2. Adapting to the context at run-time: unlike other tools, Alumet allows the pipeline to be reconfigured on-the-fly. With our novel approach, you can switch from monitoring at 1 Hz to profiling at 1000 Hz without restarting anything.

Rigorous

Finally, the project is based on active research work, involving both academia and industry. One of our goals is to overcome the limitations and mistakes that we found in the other tools. We want to produce a robust tool that will offer accurate measurements in different contexts, such as CS research, HPC clusters and Cloud services.

People at BULL SAS (part of Eviden) and the LIG (Grenoble's laboratory of computer science) are working on Alumet.

Installing Alumet

⚠️  Alumet is currently in Beta.

If you have trouble using Alumet, do not hesitate to discuss with us, we will help you find a solution. If you think that you have found a bug, please open an issue in the repository.

For the moment, the only way to use Alumet is to download its sources and to compile it (see below). We intend to provide easy-to-use packages in the future.

Compiling from source

Prerequisite: you need to install the Rust toolchain.

Open a Terminal and download the repository:

git clone https://github.com/alumet-dev/alumet.git

The Alumet repository contains multiple crates ("crates" are Rust libraries/packages). To run Alumet, we are interested in app-agent, which produces a runnable measurement tool by compiling the core of Alumet and a set of standard plugins into a single binary.

Let's compile this agent.

cd alumet/app-agent
cargo build

The binary should be located in ../target/debug/alumet-agent. You can check this with a simple ls:

ls ../target/debug/alumet-agent

If the agent is there, you can run it. Otherwise, look into the target directory to find the agent.

For the first time, let's use --help to learn about the available arguments.

$ ../target/debug/alumet-agent
[2024-05-14T17:58:00Z INFO  alumet_agent] Starting ALUMET agent v0.4.1
Command line arguments

Usage: alumet-agent [OPTIONS] [COMMAND]

Commands:
  run           Run the agent and monitor the system
  exec          Execute a command and observe its process
  regen-config  Regenerate the configuration file and stop
  help          Print this message or the help of the given subcommand(s)

Options:
      --max-update-interval <MAX_UPDATE_INTERVAL>
          Maximum amount of time between two updates of the sources' commands.
          
          A lower value means that the latency of source commands will be lower, i.e. commands will be applied faster, at the cost of a higher overhead.

  -h, --help
          Print help (see a summary with '-h')

To observe your machine, the simplest way is to use the run command.

../target/debug/alumet-agent

Alumet will then start to observe your machine.

Required privileges

Measuring some metrics, like RAPL energy counters and perf_events, require specific privileges. Alumet will warn you about missing privileges and will suggest commands to fix the issue (there are several options).

In any case, please do not use sudo cargo run, because that would compile the project with the root user, making it unusable for your user account.

Output file and configuration

With the standard set of plugins, the measurements are saved in a CSV file. By default, this is alumet-output.csv. You can change this by editing alumet-config.toml. Note that the configuration file is automatically created by Alumet if it does not exist.

Learn more about Alumet config here.

Enabling more plugins

By default, only some plugins are enabled. To enable a plugin and include it in the Alumet agent binary, perform these two steps:

  1. Add a dependency on the plugin.
  2. Modify main.rs to enable the plugin.

Here is how to do it with the NVIDIA plugin.

  1. In the directory of app-agent, run cargo add plugin-nvidia
  2. Open src/main.rs, locate the line that contains static_plugins! (line 31) and add plugin_nvidia::NvidiaPlugin to the list of plugins. It should look like the following:
// Specifies the plugins that we want to load.
let plugins = static_plugins![RaplPlugin, CsvPlugin, SocketControlPlugin, PerfPlugin, plugin_nvidia::NvidiaPlugin];

Then, recompile the agent with cargo build.

Note: if you want to run Alumet on a NVIDIA Jetson device, replace cargo add plugin-nvidia by cargo add plugin-nvidia --features jetson --no-default-features.

Tips

Default command

Since run is the default command, you can also run the agent without any argument.

../target/debug/alumet-agent

Path to the binary

The binary produced by cargo is located, when building with a default target (which links to libc) and for the host architecture, at:

  • ../target/debug/alumet-agent in debug mode
  • ../target/release/alumet-agent in release mode

The aforementioned paths are relative to the app-agent directory.

Release mode

By default, the measurement tool is built in debug mode, which enables better diagnostics but disables many optimizations. To deploy Alumet "in production", you would want to use the release mode by adding --release to the cargo flags. For instance, use cargo build --release to produce the optimized binary and stop.

Configuration file

The file alumet-config.toml contains the configuration of the Alumet agent. It is automatically created by Alumet if it does not exist.

Commented example

Here is a commented example of the configuration file that is generated by the agent.

# -- global agent config --

# upper bound of the interval between two updates of the commands received by the measurement sources
max_update_interval = "500ms"

# -- plugins configs, one table per plugin --

[plugins.rapl]
# Interval between each measurement of the RAPL plugin.
# Most plugins that provide measurement sources also provide this configuration option.
# Example: "1s" = 1 second
# Example: "1ms" = 1 millisecond
poll_interval = "1s"

# Measurements are kept in a buffer and are only sent to the next step of the Alumet pipeline
# when the flush interval expires.
flush_interval = "5s"

# Set to "true" to disable perf_events and only use powercap instead.
# By default, the rapl plugin tries to use perf_events, and use powercap if that fails.
no_perf_events = false

[plugins.csv]
# Path to the output file that contains the measurements.
output_path = "alumet-output.csv"

# Flush the file writer after each write operation.
# A "write operation" may contain multiple measurement points.
force_flush = true

# In the "metric" column, append the unit to the name of each metric (except if the unit's name is empty).
# For instance, a metric "energy" with a unit "Joules" (symbol "J") will be serialized as "energy_J".
append_unit_to_metric_name = true

# Use the display name of the units instead of their unique name, as specified by the UCUM.
# See https://ucum.org/ucum for a list of unit and their symbols.
use_unit_display_name = true

# The character to use to separate CSV columns.
csv_delimiter = ";"

[plugins.perf]
# List of hardware perf_events to measure.
hardware_events = ["REF_CPU_CYCLES", "CACHE_MISSES", "BRANCH_MISSES"]

# List of software perf_events to measure.
software_events = []

# List of cache perf_events to measure.
cache_events = ["LL_READ_MISS"]

# NOTE: for the moment, the perf plugin does not monitor the whole machine.
# It provides the ability to monitor a specific process or cgroup, but this
# ability needs to be explicitly turned on by another plugin.
# The Alumet agent automatically does this when run with the `exec` command.

Regenerating the file

When you change the plugins that are included in the agent, or when you install a new version of Alumet, the configuration options may change. You can replace the existing configuration file by a fresh, updated version of the configuration by using the regen-config command.

Example:

alumet-agent regen-config
# Note: replace alumet-agent by the path to the binary application, or by `cargo run --`

Example (if you use cargo run):

cargo run -- regen-config

Command-line arguments

Some command-line arguments override the options defined in the configuration file. This is the case of --max_update_interval, which overrides max_update_interval from the config file.

Execution mode

When started with the exec command, the Alumet agent (built by the app-agent crate) spawns a new process with the specified command and stops when the process exits.

The execution mode automatically make some plugins, such as the perf plugin, gather more metrics about the spawned process.

Example

Let's try this feature with a simple sleep command.

alumet-agent exec sleep 1

With the standard plugins (rapl, perf, csv), the resulting CSV file looks like the following (formatted to make it easier to read).

metric                       ;timestamp                      ;value             ;resource_kind ;resource_id ;consumer_kind ;consumer_id ;__late_attributes
perf_hardware_REF_CPU_CYCLES ;2024-05-14T21:28:49.416768909Z ; 0                ;local_machine ;            ;process       ;     728039 ;
perf_hardware_CACHE_MISSES   ;2024-05-14T21:28:49.416768909Z ; 0                ;local_machine ;            ;process       ;     728039 ;
perf_hardware_BRANCH_MISSES  ;2024-05-14T21:28:49.416768909Z ; 0                ;local_machine ;            ;process       ;     728039 ;
perf_cache_LL_READ_MISS      ;2024-05-14T21:28:49.416768909Z ; 0                ;local_machine ;            ;process       ;     728039 ;
rapl_consumed_energy_J       ;2024-05-14T21:28:50.389874134Z ; 0.89825439453125 ;dram          ;          0 ;local_machine ;            ;domain=dram
rapl_consumed_energy_J       ;2024-05-14T21:28:50.389874134Z ; 5.779296875      ;cpu_package   ;          0 ;local_machine ;            ;domain=pp0
rapl_consumed_energy_J       ;2024-05-14T21:28:50.389874134Z ;50.3060302734375  ;local_machine ;            ;local_machine ;            ;domain=platform
rapl_consumed_energy_J       ;2024-05-14T21:28:50.389874134Z ; 0                ;cpu_package   ;          0 ;local_machine ;            ;domain=pp1
rapl_consumed_energy_J       ;2024-05-14T21:28:50.389874134Z ;10.3299560546875  ;cpu_package   ;          0 ;local_machine ;            ;domain=package

There are several things to note here.

First, as expected, a simple sleep does not use any cpu cycle. This is reported by the perf_hardware_REF_CPU_CYCLES metric.

Second, the computer consumed some energy during the sleep. This is reported by the rapl_consumed_energy_J metric. The J indicates that the measurements are in Joules. Note that perf_hardware_REF_CPU_CYCLES does not have a unit suffix, because it's a dimensionless value: a counter. In any case, the CSV plugin can be configured not to include the suffix in the resulting file. But let's go back to the RAPL metric. Here, the metric is given five times, because five different RAPL domains are available on this machine (dram, pp0, pp1, package and platform).

⚠️ As indicated by the value local_machine in the consumer_kind column, the metric rapl_consumed_energy_J does not report the energy consumed by the process spawned with alumet-agent exec, but the total energy consumption of the associated RAPL domain (since the previous measurement of the metric, but here we only have one value).

Finally, the timestamps are serialized in the UTC timezone, hence the Z suffix.