Introduction
Welcome to the ALUMET user guide! If you want to measure something with ALUMET, you have come to the right place.
To skip the introduction and install ALUMET, click here.
What is ALUMET?
ALUMET is a modular and efficient software measurement tool. With ALUMET, you can:
- measure the energy consumption of your CPU, GPU, and more
- assign the energy consumption of hardware resources to their consumers (such as processes, K8S pods, containers, etc.)
- gather performance metrics at a configurable frequency
- monitor laptops, desktops and HPC servers
- profile your applications
ALUMET (sometimes written "Alumet") is acronym for Adaptive, Lightweight, Unified METrics.
High-level architecture
The measurement sources (in yellow), data transforms (in blue) and outputs (in green) are provided by plugins, not by Alumet's core. Plugins are developed as separate software libraries. This is an improvement over monolithic tools, because it allows developers to easily extend the capabilities of the tool, in every part of the measurement pipeline (sources -> transforms -> outputs).
We offer many "standard" plugins, but you are free to create your own, for instance if you want to gather metrics from a piece of hardware that we do not support. Please read the developer guide to learn more about the creation of plugins.
Performance
The L in Alumet stands for Lightweight. Why is Alumet "lightweight" compared to other measurement tools?
- Optimized pipeline: Alumet is written in Rust, optimized for minimal latency and low memory consumption.
- Efficient interfaces: When we develop a new measurement source, we try to find the most efficient way of measuring what we're interested in. As a result, many plugins are based on low-level interfaces, such as the Linux perf_events interface, instead of slower higher-level wrappers. In particular, we try to remove useless intermediate levels, such as calling an external program and parsing its text output.
- Pay only for what you need: Alumet's modularity allows you to create a bespoke measurement tool by choosing the plugins that suit your needs, and removing the rest. You don't need a mathematical model that assigns the energy consumption of hardware components to processes? Remove it, and enjoy an even smaller disk footprint, CPU overhead, memory use and energy consumption.
Read more about the advantages of Alumet on the next page: Why ALUMET and not <X>?.
Does it work on <my_machine>?
For now, Alumet works in the following environments:
- Operating Systems: Linux,
macOS1, Windows1 - Hardware components2:
- CPUs: Intel x86 processors (Sandy Bridge or more recent), AMD x86 processors (Zen 1 or more recent), NVIDIA Jetson CPUs (any model)
- GPUs: NVIDIA dedicated GPUs, NVIDIA Jetson GPUs (any model)
While the core of Alumet is cross-platform, many plugins only work on Linux, for example the RAPL and perf plugins. There is no macOS-specific nor Windows-specific plugin for the moment, so Alumet will not be able to measure interesting metrics on these systems.
If your computer contains both supported and unsupported components, you can still use Alumet (with the plugins corresponding to the supported components). It will simply not measure the unsupported components.
Why ALUMET and not <X>?
Every tool comes with its limitations, and Alumet is not the only measurement software out there. Here is why we think that Alumet may be better than other existing tools.
Generic and Unified
You can plug many different sources, outputs, and even transformation functions into Alumet, without modifying its core. Instead of having one specialized tool for the CPU, another one for the GPU, and another one for Kubernetes pods, each with a different interface, polling frequency and methodology, you can have one instance Alumet.
With Alumet, the tedious work that was previously duplicated is now factorized in a common tool. Tedious work for the administrator: configuring the gathering of the metrics, giving the proper rights to each tool, saving the results to a database, etc. But also tedious work for the developer: supporting a new database, writing the configuration management code, optimizing each tool, etc.
Extensible
Alumet is made of a core, on top of which we add plugins. You choose the plugins that suit your needs and build a measurement application with them. You can also create new plugins with an easy-to-use, high-level API.
Some existing tools claim that they have a plugin interface and a modular code. However, their modularity is often limited to a few functions or abstract interfaces in a largely monolithic codebase. For example, supporting a new source of measurements often require to modify the core of the tool. In contrast, Alumet offers modularity in a way that is both easy to use and powerful. The first evidence of this advantage is that the core and the plugins are in distinct crates, rather than in a monolithic codebase. A second evidence is that Alumet plugins have less restrictions than Telegraf plugins. For instance, the same Alumet plugin can provide sources, transforms and outputs. A plugin can also modify the pipeline's configuration at runtime, without prior knowledge of the other plugins.
Lightweight and fast
Alumet is written in Rust and optimized for minimal latency and low memory consumption. Furthermore, many plugins are based on low-level sensors like the perf_events interface of the Linux kernel. Finally, the plugin system allows you to only include the plugins that suit your needs, instead of installing a do-it-all monolith.
Our preliminary results seem to show that Alumet uses less CPU cycles, consumes less memory and is overall more efficient than the existing tools. We will upload benchmark results in the future.
Adaptive
We worked hard to provide two forms of adaptation:
- Adapting to your needs at compile-time: thanks to Alumet's modularity, you are able to build a measurement software tailored to your needs.
- Adapting to the context at run-time: unlike other tools, Alumet allows the pipeline to be reconfigured on-the-fly. With our novel approach, you can switch from monitoring at 1 Hz to profiling at 1000 Hz without restarting anything.
Rigorous
Finally, the project is based on active research work, involving both academia and industry. One of our goals is to overcome the limitations and mistakes that we found in the other tools. We want to produce a robust tool that will offer accurate measurements in different contexts, such as CS research, HPC clusters and Cloud services.
People at BULL SAS (part of Eviden) and the LIG (Grenoble's laboratory of computer science) are working on Alumet.
Installing Alumet
⚠️ Alumet is currently in Beta.
If you have trouble using Alumet, do not hesitate to discuss with us, we will help you find a solution. If you think that you have found a bug, please open an issue in the repository.
For the moment, the only way to use Alumet is to download its sources and to compile it (see below). We intend to provide easy-to-use packages in the future.
Compiling from source
Prerequisite: A recent version of Rust is required (at least 1.76 for now). You can run rustc --version
to check your version. The easiest way to install a recent version of Rust is to use rustup.
Open a Terminal and download the repository:
git clone https://github.com/alumet-dev/alumet.git
The Alumet repository contains multiple crates ("crates" are Rust libraries/packages).
To run Alumet, we are interested in app-agent
, which produces a runnable measurement tool by compiling the core of Alumet and a set of standard plugins into a single binary. The crate app-agent
contains multiple standard agents, as described in the corresponding README.
Let's compile the simplest agent, the "local" one.
cd alumet/app-agent
cargo build --bin alumet-local-agent --features local_x86
The binary should be located in ../target/debug/alumet-local-agent
. You can check this with a simple ls
:
ls ../target/debug/alumet-local-agent
If the agent is there, you can run it. Otherwise, look into the target directory to find the agent.
For the first time, let's use --help
to learn about the available arguments.
$ ../target/debug/alumet-local-agent
[2024-10-15T17:58:00Z INFO alumet_local_agent] Starting ALUMET agent v0.6.1
Command line arguments
Usage: alumet-local-agent [OPTIONS] [COMMAND]
Commands:
run Run the agent and monitor the system
exec Execute a command and observe its process
regen-config Regenerate the configuration file and stop
help Print this message or the help of the given subcommand(s)
Options:
--config <CONFIG>
Path to the config file
[default: alumet-config.toml]
[...]
I have omitted some lines here, run the command to discover the actual output :)
To observe your machine, the simplest way is to use the run
command.
../target/debug/alumet-local-agent
Alumet will start to monitor various hardware and software components.
Required privileges
Measuring some metrics, like RAPL energy counters and perf_events, require specific privileges. Alumet will warn you about missing privileges and will suggest commands to fix the issue (there are several options).
In any case, please do not use sudo cargo run
, because that would compile the project with the root user, making it unusable for your user account.
Output file and configuration
With the standard set of plugins, the measurements are saved in a CSV file.
By default, this is alumet-output.csv
. You can change this by editing alumet-config.toml
. Note that the configuration file is automatically created by Alumet if it does not exist.
Learn more about Alumet config here.
Enabling more plugins
By default, only some plugins are enabled. To enable a plugin and include it in the Alumet agent binary, perform these two steps:
- Add a dependency on the plugin.
- Modify
local.rs
to enable the plugin.
Here is how to do it with the NVIDIA plugin.
- In the directory of
app-agent
, runcargo add plugin-nvidia
- Open
src/bin/local.rs
, locate the line that containsstatic_plugins!
and addplugin_nvidia::NvidiaPlugin
to the list of plugins.
let plugins = static_plugins![
// ...
+ plugin_nvidia::NvidiaPlugin,
];
Then, recompile the agent with cargo build
.
Note: if you want to run Alumet on a NVIDIA Jetson device, replace
cargo add plugin-nvidia
bycargo add plugin-nvidia --features jetson --no-default-features
.
Tips
Default command
Since run
is the default command, you can also run the agent without any argument.
../target/debug/alumet-local-agent
Path to the binary
The binary produced by cargo
is located, when building with a default target (which links to libc) and for the host architecture, at:
../target/debug/alumet-local-agent
in debug mode../target/release/alumet-local-agent
in release mode
The aforementioned paths are relative to the app-agent
directory.
Release mode
By default, the measurement tool is built in debug mode, which enables better diagnostics but disables many optimizations.
To deploy Alumet "in production", you would want to use the release mode by adding --release
to the cargo flags.
For instance:
cd alumet/app-agent
cargo build --bin alumet-local-agent --features local_x86 --release
The optimized agent will be saved to target/release/alumet-local-agent
.
Configuration file
The file alumet-config.toml
contains the configuration of the Alumet agent.
It is automatically created by Alumet if it does not exist.
Commented example
Here is a commented example of the configuration file that is generated by the agent.
# -- global agent config --
# upper bound of the interval between two updates of the commands received by the measurement sources
max_update_interval = "500ms"
# -- plugins configs, one table per plugin --
[plugins.rapl]
# Interval between each measurement of the RAPL plugin.
# Most plugins that provide measurement sources also provide this configuration option.
# Example: "1s" = 1 second
# Example: "1ms" = 1 millisecond
poll_interval = "1s"
# Measurements are kept in a buffer and are only sent to the next step of the Alumet pipeline
# when the flush interval expires.
flush_interval = "5s"
# Set to "true" to disable perf_events and only use powercap instead.
# By default, the rapl plugin tries to use perf_events, and use powercap if that fails.
no_perf_events = false
[plugins.csv]
# Path to the output file that contains the measurements.
output_path = "alumet-output.csv"
# Flush the file writer after each write operation.
# A "write operation" may contain multiple measurement points.
force_flush = true
# In the "metric" column, append the unit to the name of each metric (except if the unit's name is empty).
# For instance, a metric "energy" with a unit "Joules" (symbol "J") will be serialized as "energy_J".
append_unit_to_metric_name = true
# Use the display name of the units instead of their unique name, as specified by the UCUM.
# See https://ucum.org/ucum for a list of unit and their symbols.
use_unit_display_name = true
# The character to use to separate CSV columns.
csv_delimiter = ";"
[plugins.perf]
# List of hardware perf_events to measure.
hardware_events = ["REF_CPU_CYCLES", "CACHE_MISSES", "BRANCH_MISSES"]
# List of software perf_events to measure.
software_events = []
# List of cache perf_events to measure.
cache_events = ["LL_READ_MISS"]
# NOTE: for the moment, the perf plugin does not monitor the whole machine.
# It provides the ability to monitor a specific process or cgroup, but this
# ability needs to be explicitly turned on by another plugin.
# The Alumet agent automatically does this when run with the `exec` command.
Regenerating the file
When you change the plugins that are included in the agent, or when you install a new version of Alumet,
the configuration options may change. You can replace the existing configuration file by a fresh, updated
version of the configuration by using the regen-config
command.
Example:
alumet-agent regen-config
# Note: replace alumet-agent by the path to the binary application, or by `cargo run --`
Example (if you use cargo run):
cargo run -- regen-config
Command-line arguments
Some command-line arguments override the options defined in the configuration file.
This is the case of --max-update-interval
, which overrides max_update_interval
from the config file.
Execution mode
When started with the exec
command, the Alumet agent (built by the app-agent
crate) spawns a new process
with the specified command and stops when the process exits.
The execution mode automatically make some plugins, such as the perf
plugin, gather more metrics about the spawned process.
Example
Let's try this feature with a simple sleep
command.
alumet-agent exec sleep 1
With the standard plugins (rapl, perf, csv), the resulting CSV file looks like the following (formatted to make it easier to read).
metric ;timestamp ;value ;resource_kind ;resource_id ;consumer_kind ;consumer_id ;__late_attributes
perf_hardware_REF_CPU_CYCLES ;2024-05-14T21:28:49.416768909Z ; 0 ;local_machine ; ;process ; 728039 ;
perf_hardware_CACHE_MISSES ;2024-05-14T21:28:49.416768909Z ; 0 ;local_machine ; ;process ; 728039 ;
perf_hardware_BRANCH_MISSES ;2024-05-14T21:28:49.416768909Z ; 0 ;local_machine ; ;process ; 728039 ;
perf_cache_LL_READ_MISS ;2024-05-14T21:28:49.416768909Z ; 0 ;local_machine ; ;process ; 728039 ;
rapl_consumed_energy_J ;2024-05-14T21:28:50.389874134Z ; 0.89825439453125 ;dram ; 0 ;local_machine ; ;domain=dram
rapl_consumed_energy_J ;2024-05-14T21:28:50.389874134Z ; 5.779296875 ;cpu_package ; 0 ;local_machine ; ;domain=pp0
rapl_consumed_energy_J ;2024-05-14T21:28:50.389874134Z ;50.3060302734375 ;local_machine ; ;local_machine ; ;domain=platform
rapl_consumed_energy_J ;2024-05-14T21:28:50.389874134Z ; 0 ;cpu_package ; 0 ;local_machine ; ;domain=pp1
rapl_consumed_energy_J ;2024-05-14T21:28:50.389874134Z ;10.3299560546875 ;cpu_package ; 0 ;local_machine ; ;domain=package
There are several things to note here.
First, as expected, a simple sleep
does not use any cpu cycle. This is reported by the perf_hardware_REF_CPU_CYCLES
metric.
Second, the computer consumed some energy during the sleep
. This is reported by the rapl_consumed_energy_J
metric. The J
indicates that the measurements are in Joules. Note that perf_hardware_REF_CPU_CYCLES
does not have a unit suffix, because it's a dimensionless value: a counter. In any case, the CSV plugin can be configured not to include the suffix in the resulting file. But let's go back to the RAPL metric. Here, the metric is given five times, because five different RAPL domains are available on this machine (dram, pp0, pp1, package and platform).
⚠️ As indicated by the value local_machine
in the consumer_kind
column, the metric rapl_consumed_energy_J
does not report the energy consumed by the process spawned with alumet-agent exec
, but the total energy consumption of the associated RAPL domain (since the previous measurement of the metric, but here we only have one value).
Finally, the timestamps are serialized in the UTC timezone, hence the Z
suffix.