How to install and configure Prometheus AlertManager in Linux

The Alertmanager handles alerts sent by client applications such as the Prometheus server. It takes care of deduplicating, grouping, and routing them to the correct receiver integration such as email, PagerDuty, or OpsGenie. It also takes care of silencing and inhibition of alerts. Prometheus can generate alerts when a target is unavailable and send them to the Alert Manager, sending you a notification to let you know that a target is down. This is just an example. Prometheus can send alerts to Alert Manager depending on any Prometheus metrics. So, the possibilities are limitless.

In this guide we will learn how to install and set up alert manager in Linux. We will also learn how to configure Prometheus and Alert Manager to send you slack notification when a Prometheus target is down (unavailable)

Prerequisites

You need to have a working prometheus set up before proceeding. Checkout the following guides if you need help setting up prometheus

Installing Alert Manager

Alert manager is available as a released tar file from the prometheus downloads page. Head over there and grab the latest version. In my case I am using a Linux server, this is the comand that will download the package.

1
2
3
4
$ curl -LO https://github.com/prometheus/alertmanager/releases/download/v0.25.0/alertmanager-0.25.0.linux-amd64.tar.gz

$ ls
alertmanager-0.25.0.linux-amd64.tar.gz

Once the download is complete, extract it and move to the /opt/alertmanager directory.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
$ tar -xzvf alertmanager-0.25.0.linux-amd64.tar.gz

alertmanager-0.25.0.linux-amd64/
alertmanager-0.25.0.linux-amd64/alertmanager.yml
alertmanager-0.25.0.linux-amd64/NOTICE
alertmanager-0.25.0.linux-amd64/amtool
alertmanager-0.25.0.linux-amd64/alertmanager
alertmanager-0.25.0.linux-amd64/LICENSE

$ sudo mv -v alertmanager-0.25.0.linux-amd64 /opt/alertmanager

The directory contains two important files; the alertmanager binary application and the alertmanager.yml configuration file with the initial configurations.

Since we will be running the application as the prometheus user (which was created as part of prometheus set up), make sure that user owns the directory.

1
sudo chown -Rfv prometheus:prometheus /opt/alertmanager

Creating a Data Directory

Alert Manager needs a directory where it can store its data. As you will be running Alert Manager as the prometheus system user. The prometheus system user must have access (read, write, and execute permissions) to that data directory.

You can create the data/ directory in the /opt/alertmanager/ directory as follows:

1
2
sudo mkdir -v /opt/alertmanager/data
sudo chown -Rfv prometheus:prometheus /opt/alertmanager/data

Create a systemd Service unit for Alertmanager

To manage the service, we will use a systemd. Systemd allows us to start, stop, restart, and enable service start on os startup. Create a service file in the following path:

1
sudo vim /etc/systemd/system/alertmanager.service

And add the following content to the file

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
[Unit]
Description=Alertmanager for prometheus
Wants=network-online.target
After=network-online.target

[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/opt/alertmanager/alertmanager \
  --config.file=/opt/alertmanager/alertmanager.yml \
  --storage.path=/opt/alertmanager/data
ExecReload=/bin/kill -HUP $MAINPID
TimeoutStopSec=20s
SendSIGKILL=no

[Install]
WantedBy=multi-user.target

Save and exit the file.

For the systemd changes to take effect, run the following command:

1
sudo systemctl daemon-reload

Now, start the alertmanager service with the following command:

1
sudo systemctl start alertmanager

Confirm that the service is running as expected by checking its status:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
$ sudo systemctl status alertmanager
● alertmanager.service - Alertmanager for prometheus
   Loaded: loaded (/etc/systemd/system/alertmanager.service; enabled; vendor preset: disabled)
   Active: active (running) since Thu 2023-02-09 10:02:37 UTC; 2h 27min ago
 Main PID: 599774 (alertmanager)
    Tasks: 8 (limit: 23406)
   Memory: 27.8M
   CGroup: /system.slice/alertmanager.service
           └─599774 /opt/alertmanager/alertmanager --config.file=/opt/alertmanager/alertmanager.yml --storage.path=/opt/alertmanager/data

Feb 09 10:02:37 monitoring alertmanager[599774]: ts=2023-02-09T10:02:37.309Z caller=main.go:240 level=info msg="Starting Alertmanager" version="(version=0.25.0, branch=HEAD, revision=258fab7cdd551f2cf251ed0348f0ad7289aee789)"
Feb 09 10:02:37 monitoring alertmanager[599774]: ts=2023-02-09T10:02:37.309Z caller=main.go:241 level=info build_context="(go=go1.19.4, user=root@abe866dd5717, date=20221222-14:51:36)"
Feb 09 10:02:37 monitoring alertmanager[599774]: ts=2023-02-09T10:02:37.327Z caller=cluster.go:185 level=info component=cluster msg="setting advertise address explicitly" addr=10.18.0.10 port=9094
Feb 09 10:02:37 monitoring alertmanager[599774]: ts=2023-02-09T10:02:37.340Z caller=cluster.go:681 level=info component=cluster msg="Waiting for gossip to settle..." interval=2s
Feb 09 10:02:37 monitoring alertmanager[599774]: ts=2023-02-09T10:02:37.382Z caller=coordinator.go:113 level=info component=configuration msg="Loading configuration file" file=/opt/alertmanager/alertmanager.yml
Feb 09 10:02:37 monitoring alertmanager[599774]: ts=2023-02-09T10:02:37.383Z caller=coordinator.go:126 level=info component=configuration msg="Completed loading of configuration file" file=/opt/alertmanager/alertmanager.yml
Feb 09 10:02:37 monitoring alertmanager[599774]: ts=2023-02-09T10:02:37.388Z caller=tls_config.go:232 level=info msg="Listening on" address=[::]:9093

Add the alertmanager service to the system startup so that it automatically starts on boot with the following command:

1
sudo systemctl enable alertmanager

Configuring Prometheus

Now, you have to configure Prometheus to use Alert Manager. You can also monitor Alert Manager with Prometheus. We will learn how to do both in this section.

To scrape alert manager metrics, we need to add a section in the scrape_configs sections of prometheus configuration file.

In my case the alert manager and prometheus server is running in the same host, so I will use 127.0.0.1:9093 as the target otherwise substitute 127.0.0.1 with your alertmanager host IP. You can find the host IP using this command:

1
hostname -I

This is how my configs in prometheus.yml file looks like after adding alert manager:

1
2
3
4
5
6
7
...
# A scrape configuration containing exactly one endpoint to scrape:
scrape_configs:
  - job_name: 'alertmanager'
    static_configs:
    - targets: ['127.0.0.1:9093']
...

Also, type in the IP address and port number of Alert Manager in the alerting > alertmanagers section of the prometheus.yml file

1
2
3
4
5
6
7
8
...
# Alertmanager configuration
alerting:
  alertmanagers:
    - static_configs:
        - targets:
          - 127.0.0.1:9093
...

For the changes to take effect, restart the prometheus service as follows:

1
sudo systemctl restart prometheus

Visit the URL http://<server_ip>:9090/targets from your favorite web browser, and you should see that alertmanager is in the UP state. So, Prometheus can access Alert Manager just fine.

Creating a Prometheus Alert Rule

On Prometheus, you can use the up expression to find the state of the targets added to Prometheus in the graph search section.

The targets that are in the UP state (running and accessible to Prometheus) will have the value 1, and targets that are not in the UP (or DOWN) state (not running or inaccessible to Prometheus) will have the value ****.

If you stop one of the targets node_exporter (let’s say).

1
sudo systemctl stop node_exporter

The UP value of that target in prometheus should be 0. So, you can use the up == 0 expressions to list only the targets that are not running or inaccessible to Prometheus.

This expression can be used to create a Prometheus Alert and send alerts to Alert Manager when one or more targets are not running or inaccessible to Prometheus.

To create a Prometheus Alert, create a new file rules.yml in the /opt/prometheus/ directory as follows:

1
sudo vim /opt/prometheus/rules.yml

Now, type in the following lines in the rules.yml file.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
groups:
- name: Instances
  rules:
  - alert: InstanceDown
    expr: up == 0
    for: 1m
    # Labels - additional labels to be attached to the alert
    labels:
      severity: 'critical'
    # Prometheus templates apply here in the annotation and label fields of the alert.
    annotations:
      summary: 'Instance {{ $labels.instance }} down'
      description: '{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 1 minute.'

Here, the alert InstanceDown will be fired when targets are not running or inaccessible to Prometheus (that is up == 0) for a minute (1m).

Now, update the Prometheus configuration file /opt/prometheus/prometheus.yml as follows:

1
2
3
4
5
...
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  - "/opt/prometheus/rules.yaml"
...

Another important option of the prometheus.yml file is evaluation_interval. Prometheus will check whether any rules matched every evaluation_interval time. The default is 15s (15 seconds). So, the Alert rules in the rules.yml file will be checked every 15 seconds.

For the changes to take effect, restart the prometheus service:

1
sudo systemctl restart prometheus

Now, navigate to the URL http://<server_ip>:9090/rules from your favorite web browser, and you should see the rule InstanceDown that you’ve just added.

Navigate to the URL http://<server_ip>:9090/alerts from your favorite web browser, and you should see the state of the alert InstanceDown.

As you’ve stopped node_exporter earlier, the alert is active, and it is waiting to be sent to the Alert Manager.

After a minute has passed, the alert InstanceDown should be in the FIRING state. It means that the alert is sent to the Alert Manager.

Configuring Slack Receiver on Alert Manager

In this section, I will show you how to configure Slack as the Alert Manager receiver so that you can get messages on your Slack account from Alert Manager if a Prometheus target is DOWN.

If you want to receive notifications via Slack, you should be part of a Slack workspace. If you are currently not a part of any Slack workspace, or you want to test this out in separate workspace, you can quickly create one here.

To set up alerting in your Slack workspace, you’re going to need a Slack API URL. Go to Slack -> Administration -> Manage apps.

In the Manage apps directory, search for Incoming WebHooks and add it to your Slack workspace.[][1]

Next, specify in which channel you’d like to receive notifications from Alertmanager. (I’ve created #citizix-alerts channel.) After you confirm and add Incoming WebHooks integration, webhook URL (which is your Slack API URL) is displayed. Copy it.

Then you need to modify the alertmanager.yml file. Fill out your alertmanager.yml based on the template below. Use the url that you have just copied as slack_api_url.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
global:
  resolve_timeout: 1m
  slack_api_url: 'https://hooks.slack.com/services/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'

route:
  group_by: ['alertname']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 1h
  receiver: 'slack-notifications'
receivers:
  - name: 'slack-notifications'
    slack_configs:
    - channel: '#citizix-alerts'
      send_resolved: true
      text: 'Alert group {{ .GroupLabels.alertname }} triggered'
inhibit_rules:
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname', 'dev', 'instance']

In the above configs, we have updated the alertmanager receiver to slack-notifications, the receiver we have created and added configs. It will use that from now on.

repeat_interval in route configurations is also an important Alert Manager option. By default, repeat_interval is set to 1h (1 hour). If Alert Manager has successfully sent you a message on slack, it will wait an hour before sending you another one. If you don’t want to get emails very frequently, you can increase it.

Now, restart the alertmanager service for the changes to take effect:

1
sudo systemctl restart alertmanager

You should get an message on slack shortly, as you had stopped node_exporter earlier, remember?

That is it

In this article, we have learnt how to install and configure Alertmanager in a Linux server. We have learnt how to configure Alert Manager and Prometheus to send slack notifications when a Prometheus target is DOWN.

comments powered by Disqus
Built with Hugo
Theme Stack designed by Jimmy