In this guide you’ll learn how to set up and configure Node Exporter to collect Linux system metrics like CPU load and disk I/O and expose them as Prometheus-style metrics. You’ll then configure Prometheus to scrape Node Exporter metrics and optionally ship them to Grafana instance. Finally, you’ll set up a preconfigured and curated set of recording rules, Grafana dashboards, and alerting rules. At the end of this guide you’ll have dashboards that you can use to visualize your Linux system metrics, and set of preconfigured alerts.
In this guide we will do the following:
- Set up and configured Node Exporter to collect Linux system metrics like CPU load and disk I/O. Node Exporter will expose these as Prometheus-style metrics.
- Configure Prometheus to scrape Node Exporter metrics and optionally ship them to Grafana instance.
- Set up a preconfigured and curated set of recording rules to cache frequent queries.
- Import Grafana dashboards to visualize your metrics data.
- Set up Prometheus alerting rules to alert on your metrics data.
This exporter publishes roughly 500 Prometheus time series by default. Note that depending on its configuration, Node Exporter may collect and publish far more metrics than this default set.
Related Content
- How to install and configure Prometheus AlertManager in Linux
- How to Set up Prometheus Node exporter in Kubernetes
- How to run Prometheus with docker and docker-compose
- How to install and configure Prometheus AlertManager in Linux
- How to Setup Promtail, Grafana and Loki for free Log Management in Debian 11
- How to install and set up Grafana in Ubuntu 20.04 using Ansible
- How To Install and Configure Prometheus On a Linux Server
- How to run Grafana Loki with docker and docker-compose
Prerequisites
Before you get started, you should have the following available to you:
- A Linux machine compatible with a Node Exporter release.
- Prometheus running in your environment or directly on the Linux machine. Checkout How To Install and Configure Prometheus On a Linux Server.
- Grafana running in your environment or directly on the Linux machine. Checkout How to install and configure Grafana OSS in Debian 11
Step 1: Setting up Node Exporter
In this step you’ll set up Node Exporter on your Linux machine to collect and expose system metrics.
To begin, log in to your machine and download the relevant Node Exporter binary. In this guide we’ll use linux-amd64
but you should choose the one corresponding to your system’s OS and architecture. Head over to the node exporter releases page and grab the latest version then use this command. Replace 1.3.1
with the version you want to install.
curl -LO https://github.com/prometheus/node_exporter/releases/download/v1.3.1/node_exporter-1.3.1.linux-amd64.tar.gz
Unzip the tarball and cd
into the directory:
tar xvfz node_exporter-*.*-amd64.tar.gz
cd node_exporter-*.*-amd64
Move node exporter to the bin directory:
sudo mv node_exporter /usr/local/bin/
You can now run node exporter by typing node_exporter
. But the ideal way would be to use a service manager that would manage the service beyond the current session.
Step 2: Setting up systemd service for node exporter
Let us create a systemd service to manage node exporter.
sudo vim /etc/systemd/system/node_exporter.service
Add this content to the file
[Unit]
Description=Node Exporter service
After=network.target
[Service]
Type=simple
User=root
ExecStart=/usr/local/bin/node_exporter --collector.systemd --collector.processes
[Install]
WantedBy=multi-user.target
Reload systemd units for the new service to be registered.
sudo systemctl daemon-reload
Start the service
sudo systemctl start node_exporter
Check status to confirm that it is running:
$ sudo systemctl status node_exporter
● node_exporter.service - Node Exporter service
Loaded: loaded (/etc/systemd/system/node_exporter.service; disabled; vendor preset: disabled)
Active: active (running) since Mon 2022-03-14 07:49:03 UTC; 19s ago
Main PID: 4103504 (node_exporter)
Tasks: 4 (limit: 23472)
Memory: 2.3M
CGroup: /system.slice/node_exporter.service
└─4103504 /usr/local/bin/node_exporter
Mar 14 07:49:03 staging-server.javaselfdrive.com node_exporter[4103504]: ts=2022-03-14T07:49:03.614Z caller=node_exporter.go:115 level=info collector=thermal_zone
Mar 14 07:49:03 staging-server.javaselfdrive.com node_exporter[4103504]: ts=2022-03-14T07:49:03.614Z caller=node_exporter.go:115 level=info collector=time
Mar 14 07:49:03 staging-server.javaselfdrive.com node_exporter[4103504]: ts=2022-03-14T07:49:03.614Z caller=node_exporter.go:115 level=info collector=timex
Mar 14 07:49:03 staging-server.javaselfdrive.com node_exporter[4103504]: ts=2022-03-14T07:49:03.614Z caller=node_exporter.go:115 level=info collector=udp_queues
Mar 14 07:49:03 staging-server.javaselfdrive.com node_exporter[4103504]: ts=2022-03-14T07:49:03.614Z caller=node_exporter.go:115 level=info collector=uname
Mar 14 07:49:03 staging-server.javaselfdrive.com node_exporter[4103504]: ts=2022-03-14T07:49:03.614Z caller=node_exporter.go:115 level=info collector=vmstat
Mar 14 07:49:03 staging-server.javaselfdrive.com node_exporter[4103504]: ts=2022-03-14T07:49:03.614Z caller=node_exporter.go:115 level=info collector=xfs
Mar 14 07:49:03 staging-server.javaselfdrive.com node_exporter[4103504]: ts=2022-03-14T07:49:03.614Z caller=node_exporter.go:115 level=info collector=zfs
Mar 14 07:49:03 staging-server.javaselfdrive.com node_exporter[4103504]: ts=2022-03-14T07:49:03.614Z caller=node_exporter.go:199 level=info msg="Listening on" address=:9100
Mar 14 07:49:03 staging-server.javaselfdrive.com node_exporter[4103504]: ts=2022-03-14T07:49:03.615Z caller=tls_config.go:195 level=info msg="TLS is disabled." http2=false
If it goes as expected you should see Active: active (running)
confirming that the service is up and running
Start the service on boot
sudo systemctl enable node_exporter
The exporter is now running and listening on port 9100. Test it by doing a manual scrape:
curl localhost:9100/metrics
You can filter the output to look at just a subset of metrics, like this:
curl -s localhost:9100/metrics | grep filesystem
Step 3: Enable the service in firewall
If you have firewall installed, you will need to allow the service. If you have firewalld, use this command to open port 9100.
sudo firewall-cmd --permanent --add-port=9100/tcp
sudo firewall-cmd --reload
Step 4: Scraping Node Exporter using Prometheus
Now that Node Exporter is up and running on your machine, you can configure a Prometheus scrape job to collect and store Node Exporter metrics.
Add the following scrape job config to the scrape_configs
section of your prometheus.yml
configuration file:
- job_name: node
static_configs:
- targets: ['linux_machine_IP_address:9100']
Replace linux_machine_IP_address
with the IP address of the machine running Node Exporter. If you’re running Prometheus on the same machine, this will be localhost
.
You will have to restart promtheus to apply the changes:
sudo systemctl restart prometheus
If you don’t have a prometheus.yml
configuration file, create a simple one using your favorite text editor. Open your preferred text editor and paste in the following Prometheus configuration:
global:
scrape_interval: 15s
scrape_configs:
- job_name: node
static_configs:
- targets: ['linux_machine_IP_address:9100']
This configuration tells Prometheus to scrape all jobs every 15 seconds. The only configured scrape job is called node
and defines a linux_machine_IP_address:9100
target. By default, Prometheus will scrape the /metrics
endpoint using HTTP.
Save and close the file. You can then run Prometheus with the file using the following command:
./prometheus --config.file=./prometheus.yml
Step 5: Configuring recording rules
Using recording rules, you can precompute and cache frequently queried metrics. For example, if a dashboard panel uses a computationally intensive query like a rate()
, you can create a recording rule that runs at a regular reduced interval and saves the result of the intensive query in a new time series. This avoids fetching and computing data every time the dashboard gets refreshed. To learn more about Prometheus recording rules, please see Recording Rules from the Prometheus docs.
You should load the following recording rules before loading the dashboards in this guide. The dashboard queries use recording rules to reduce load on the Prometheus or Grafana Cloud Metrics servers, depending on where you’re evaluating the rules.
You can fetch the recording rule YAML file here.
Load recording rules into Prometheus
To load recording rules into Prometheus, add the following to your prometheus.yml
configuration file:
rule_files:
- "node_exporter_recording_rules.yml"
Be sure to replace node_exporter_recording_rules.yml
with the path to your Node Exporter recording rules YAML file.
Step 6: Configuring dashboards
Get the dashboard here – dashboard 1860. Make sure you have added the prometheus data source then use it for the dashboard. You should start seeing metrics stream in.
Step 7: Configuring alerts
With Prometheus alerting rules, you can define alerts that fire when PromQL expressions breach some threshold or satisfy specified conditions over a period of time. For example, you can define a HighRequestLatency
alert that fires when a request latency metric is greater than some threshold over a period of time. As soon as the alerting condition is triggered, the alert moves into Pending
state. After satisfying the condition for the period of time defined by the for
parameter, the alert moves into Firing
state. You can configure routing and notifications for firing alerts using a tool like Alertmanager. Alertmanager is also built-in to Grafana Cloud.
You can fetch the alerting rule YAML file here.
Load alerting rules into Prometheus
To load alerting rules into Prometheus, add the following to your prometheus.yml
configuration file:
rule_files:
- "node_exporter_alerting_rules.yml"
Be sure to replace node_exporter_alerting_rules.yml
with the path to your Node Exporter alerting rules YAML file.
Conclusion
In this guide we installed and ran Node Exporter on our Linux machine. We then configured Prometheus to scrape the system metrics exposed by Node Exporter. We loaded recording rules and alerting rules into Prometheus, and finally imported Grafana dashboards to visualize your Linux system metrics.