How To Monitor Linux Servers Using Prometheus Node Exporter

In this guide you’ll learn how to set up and configure Node Exporter to collect Linux system metrics like CPU load and disk I/O and expose them as Prometheus-style metrics. You’ll then configure Prometheus to scrape Node Exporter metrics and optionally ship them to Grafana instance. Finally, you’ll set up a preconfigured and curated set of recording rules, Grafana dashboards, and alerting rules. At the end of this guide you’ll have dashboards that you can use to visualize your Linux system metrics, and set of preconfigured alerts.

In this guide we will do the following:

  • Set up and configured Node Exporter to collect Linux system metrics like CPU load and disk I/O. Node Exporter will expose these as Prometheus-style metrics.
  • Configure Prometheus to scrape Node Exporter metrics and optionally ship them to Grafana instance.
  • Set up a preconfigured and curated set of recording rules to cache frequent queries.
  • Import Grafana dashboards to visualize your metrics data.
  • Set up Prometheus alerting rules to alert on your metrics data.

This exporter publishes roughly 500 Prometheus time series by default. Note that depending on its configuration, Node Exporter may collect and publish far more metrics than this default set. 

Related Content

# Prerequisites

Before you get started, you should have the following available to you:

# Step 1: Setting up Node Exporter

In this step you’ll set up Node Exporter on your Linux machine to collect and expose system metrics.

To begin, log in to your machine and download the relevant Node Exporter binary. In this guide we’ll use linux-amd64 but you should choose the one corresponding to your system’s OS and architecture. Head over to the node exporter releases page and grab the latest version then use this command. Replace 1.3.1 with the version you want to install.

curl -LO https://github.com/prometheus/node_exporter/releases/download/v1.3.1/node_exporter-1.3.1.linux-amd64.tar.gz

Unzip the tarball and cd into the directory:

tar xvfz node_exporter-*.*-amd64.tar.gz
cd node_exporter-*.*-amd64

Move node exporter to the bin directory:

sudo mv node_exporter /usr/local/bin/

You can now run node exporter by typing node_exporter. But the ideal way would be to use a service manager that would manage the service beyond the current session.

# Step 2: Setting up systemd service for node exporter

Let us create a systemd service to manage node exporter.

sudo vim /etc/systemd/system/node_exporter.service

Add this content to the file

[Unit]
Description=Node Exporter service
After=network.target

[Service]
Type=simple
User=root
ExecStart=/usr/local/bin/node_exporter --collector.systemd --collector.processes

[Install]
WantedBy=multi-user.target

Reload systemd units for the new service to be registered.

sudo systemctl daemon-reload

Start the service

sudo systemctl start node_exporter

Check status to confirm that it is running:

$ sudo systemctl status node_exporter
● node_exporter.service - Node Exporter service
   Loaded: loaded (/etc/systemd/system/node_exporter.service; disabled; vendor preset: disabled)
   Active: active (running) since Mon 2022-03-14 07:49:03 UTC; 19s ago
 Main PID: 4103504 (node_exporter)
    Tasks: 4 (limit: 23472)
   Memory: 2.3M
   CGroup: /system.slice/node_exporter.service
           └─4103504 /usr/local/bin/node_exporter

Mar 14 07:49:03 staging-server.javaselfdrive.com node_exporter[4103504]: ts=2022-03-14T07:49:03.614Z caller=node_exporter.go:115 level=info collector=thermal_zone
Mar 14 07:49:03 staging-server.javaselfdrive.com node_exporter[4103504]: ts=2022-03-14T07:49:03.614Z caller=node_exporter.go:115 level=info collector=time
Mar 14 07:49:03 staging-server.javaselfdrive.com node_exporter[4103504]: ts=2022-03-14T07:49:03.614Z caller=node_exporter.go:115 level=info collector=timex
Mar 14 07:49:03 staging-server.javaselfdrive.com node_exporter[4103504]: ts=2022-03-14T07:49:03.614Z caller=node_exporter.go:115 level=info collector=udp_queues
Mar 14 07:49:03 staging-server.javaselfdrive.com node_exporter[4103504]: ts=2022-03-14T07:49:03.614Z caller=node_exporter.go:115 level=info collector=uname
Mar 14 07:49:03 staging-server.javaselfdrive.com node_exporter[4103504]: ts=2022-03-14T07:49:03.614Z caller=node_exporter.go:115 level=info collector=vmstat
Mar 14 07:49:03 staging-server.javaselfdrive.com node_exporter[4103504]: ts=2022-03-14T07:49:03.614Z caller=node_exporter.go:115 level=info collector=xfs
Mar 14 07:49:03 staging-server.javaselfdrive.com node_exporter[4103504]: ts=2022-03-14T07:49:03.614Z caller=node_exporter.go:115 level=info collector=zfs
Mar 14 07:49:03 staging-server.javaselfdrive.com node_exporter[4103504]: ts=2022-03-14T07:49:03.614Z caller=node_exporter.go:199 level=info msg="Listening on" address=:9100
Mar 14 07:49:03 staging-server.javaselfdrive.com node_exporter[4103504]: ts=2022-03-14T07:49:03.615Z caller=tls_config.go:195 level=info msg="TLS is disabled." http2=false

If it goes as expected you should see Active: active (running) confirming that the service is up and running

Start the service on boot

sudo systemctl enable node_exporter

The exporter is now running and listening on port 9100. Test it by doing a manual scrape:

curl localhost:9100/metrics

You can filter the output to look at just a subset of metrics, like this:

curl -s localhost:9100/metrics | grep filesystem

# Step 3: Enable the service in firewall

If you have firewall installed, you will need to allow the service. If you have firewalld, use this command to open port 9100.

sudo firewall-cmd --permanent --add-port=9100/tcp
sudo firewall-cmd --reload

# Step 4: Scraping Node Exporter using Prometheus

Now that Node Exporter is up and running on your machine, you can configure a Prometheus scrape job to collect and store Node Exporter metrics.

Add the following scrape job config to the scrape_configs section of your prometheus.yml configuration file:

- job_name: node
  static_configs:
  - targets: ['linux_machine_IP_address:9100']

Replace linux_machine_IP_address with the IP address of the machine running Node Exporter. If you’re running Prometheus on the same machine, this will be localhost

You will have to restart promtheus to apply the changes:

sudo systemctl restart prometheus

If you don’t have a prometheus.yml configuration file, create a simple one using your favorite text editor. Open your preferred text editor and paste in the following Prometheus configuration:

global:
  scrape_interval: 15s

scrape_configs:
- job_name: node
  static_configs:
  - targets: ['linux_machine_IP_address:9100']

This configuration tells Prometheus to scrape all jobs every 15 seconds. The only configured scrape job is called node and defines a linux_machine_IP_address:9100 target. By default, Prometheus will scrape the /metrics endpoint using HTTP.

Save and close the file. You can then run Prometheus with the file using the following command:

./prometheus --config.file=./prometheus.yml

# Step 5: Configuring recording rules

Using recording rules, you can precompute and cache frequently queried metrics. For example, if a dashboard panel uses a computationally intensive query like a rate(), you can create a recording rule that runs at a regular reduced interval and saves the result of the intensive query in a new time series. This avoids fetching and computing data every time the dashboard gets refreshed. To learn more about Prometheus recording rules, please see Recording Rules from the Prometheus docs.

You should load the following recording rules before loading the dashboards in this guide. The dashboard queries use recording rules to reduce load on the Prometheus or Grafana Cloud Metrics servers, depending on where you’re evaluating the rules.

You can fetch the recording rule YAML file here.

# Load recording rules into Prometheus

To load recording rules into Prometheus, add the following to your prometheus.yml configuration file:

rule_files:
  - "node_exporter_recording_rules.yml"

Be sure to replace node_exporter_recording_rules.yml with the path to your Node Exporter recording rules YAML file.

# Step 6: Configuring dashboards

Get the dashboard here – dashboard 1860. Make sure you have added the prometheus data source then use it for the dashboard. You should start seeing metrics stream in.

# Step 7: Configuring alerts

With Prometheus alerting rules, you can define alerts that fire when PromQL expressions breach some threshold or satisfy specified conditions over a period of time. For example, you can define a HighRequestLatency alert that fires when a request latency metric is greater than some threshold over a period of time. As soon as the alerting condition is triggered, the alert moves into Pending state. After satisfying the condition for the period of time defined by the for parameter, the alert moves into Firing state. You can configure routing and notifications for firing alerts using a tool like Alertmanager. Alertmanager is also built-in to Grafana Cloud.

You can fetch the alerting rule YAML file here.

# Load alerting rules into Prometheus

To load alerting rules into Prometheus, add the following to your prometheus.yml configuration file:

rule_files:
  - "node_exporter_alerting_rules.yml"

Be sure to replace node_exporter_alerting_rules.yml with the path to your Node Exporter alerting rules YAML file.

# Conclusion

In this guide we installed and ran Node Exporter on our Linux machine. We then configured Prometheus to scrape the system metrics exposed by Node Exporter. We loaded recording rules and alerting rules into Prometheus, and finally imported Grafana dashboards to visualize your Linux system metrics.

comments powered by Disqus
Citizix Ltd
Built with Hugo
Theme Stack designed by Jimmy