The Alertmanager handles alerts sent by client applications such as the Prometheus server. It takes care of deduplicating, grouping, and routing them to the correct receiver integration such as email, PagerDuty, or OpsGenie. It also takes care of silencing and inhibition of alerts. Prometheus can generate alerts when a target is unavailable and send them to the Alert Manager, sending you a notification to let you know that a target is down. This is just an example. Prometheus can send alerts to Alert Manager depending on any Prometheus metrics. So, the possibilities are limitless.
In this guide we will learn how to install and set up alert manager in Linux. We will also learn how to configure Prometheus and Alert Manager to send you slack notification when a Prometheus target is down (unavailable)
Prerequisites
You need to have a working prometheus set up before proceeding. Checkout the following guides if you need help setting up prometheus
- How to Set up Prometheus Node exporter in Kubernetes
- How To Install and Configure Prometheus On a Linux Server
- How To Monitor Linux Servers Using Prometheus Node Exporter
- How to run Prometheus with docker and docker-compose
Installing Alert Manager
Alert manager is available as a released tar file from the prometheus downloads page. Head over there and grab the latest version. In my case I am using a Linux server, this is the comand that will download the package.
$ curl -LO https://github.com/prometheus/alertmanager/releases/download/v0.25.0/alertmanager-0.25.0.linux-amd64.tar.gz
$ ls
alertmanager-0.25.0.linux-amd64.tar.gz
Once the download is complete, extract it and move to the /opt/alertmanager
directory.
$ tar -xzvf alertmanager-0.25.0.linux-amd64.tar.gz
alertmanager-0.25.0.linux-amd64/
alertmanager-0.25.0.linux-amd64/alertmanager.yml
alertmanager-0.25.0.linux-amd64/NOTICE
alertmanager-0.25.0.linux-amd64/amtool
alertmanager-0.25.0.linux-amd64/alertmanager
alertmanager-0.25.0.linux-amd64/LICENSE
$ sudo mv -v alertmanager-0.25.0.linux-amd64 /opt/alertmanager
The directory contains two important files; the alertmanager
binary application and the alertmanager.yml
configuration file with the initial configurations.
Since we will be running the application as the prometheus
user (which was created as part of prometheus set up), make sure that user owns the directory.
sudo chown -Rfv prometheus:prometheus /opt/alertmanager
Creating a Data Directory
Alert Manager needs a directory where it can store its data. As you will be running Alert Manager as the prometheus
system user, the prometheus
system user must have access (read, write, and execute permissions) to that data directory.
You can create the data/
directory in the /opt/alertmanager/
directory as follows:
sudo mkdir -v /opt/alertmanager/data
sudo chown -Rfv prometheus:prometheus /opt/alertmanager/data
Create a systemd Service unit for Alertmanager
To manage the service, we will use a systemd. Systemd allows us to start, stop, restart, and enable service start on os startup. Create a service file in the following path:
sudo vim /etc/systemd/system/alertmanager.service
And add the following content to the file
[Unit]
Description=Alertmanager for prometheus
Wants=network-online.target
After=network-online.target
[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/opt/alertmanager/alertmanager \
--config.file=/opt/alertmanager/alertmanager.yml \
--storage.path=/opt/alertmanager/data
ExecReload=/bin/kill -HUP $MAINPID
TimeoutStopSec=20s
SendSIGKILL=no
[Install]
WantedBy=multi-user.target
Save and exit the file.
For the systemd changes to take effect, run the following command:
sudo systemctl daemon-reload
Now, start the alertmanager
service with the following command:
sudo systemctl start alertmanager
Confirm that the service is running as expected by checking its status:
$ sudo systemctl status alertmanager
● alertmanager.service - Alertmanager for prometheus
Loaded: loaded (/etc/systemd/system/alertmanager.service; enabled; vendor preset: disabled)
Active: active (running) since Thu 2023-02-09 10:02:37 UTC; 2h 27min ago
Main PID: 599774 (alertmanager)
Tasks: 8 (limit: 23406)
Memory: 27.8M
CGroup: /system.slice/alertmanager.service
└─599774 /opt/alertmanager/alertmanager --config.file=/opt/alertmanager/alertmanager.yml --storage.path=/opt/alertmanager/data
Feb 09 10:02:37 monitoring alertmanager[599774]: ts=2023-02-09T10:02:37.309Z caller=main.go:240 level=info msg="Starting Alertmanager" version="(version=0.25.0, branch=HEAD, revision=258fab7cdd551f2cf251ed0348f0ad7289aee789)"
Feb 09 10:02:37 monitoring alertmanager[599774]: ts=2023-02-09T10:02:37.309Z caller=main.go:241 level=info build_context="(go=go1.19.4, [email protected], date=20221222-14:51:36)"
Feb 09 10:02:37 monitoring alertmanager[599774]: ts=2023-02-09T10:02:37.327Z caller=cluster.go:185 level=info component=cluster msg="setting advertise address explicitly" addr=10.18.0.10 port=9094
Feb 09 10:02:37 monitoring alertmanager[599774]: ts=2023-02-09T10:02:37.340Z caller=cluster.go:681 level=info component=cluster msg="Waiting for gossip to settle..." interval=2s
Feb 09 10:02:37 monitoring alertmanager[599774]: ts=2023-02-09T10:02:37.382Z caller=coordinator.go:113 level=info component=configuration msg="Loading configuration file" file=/opt/alertmanager/alertmanager.yml
Feb 09 10:02:37 monitoring alertmanager[599774]: ts=2023-02-09T10:02:37.383Z caller=coordinator.go:126 level=info component=configuration msg="Completed loading of configuration file" file=/opt/alertmanager/alertmanager.yml
Feb 09 10:02:37 monitoring alertmanager[599774]: ts=2023-02-09T10:02:37.388Z caller=tls_config.go:232 level=info msg="Listening on" address=[::]:9093
Add the alertmanager
service to the system startup so that it automatically starts on boot with the following command:
sudo systemctl enable alertmanager
Configuring Prometheus:
Now, you have to configure Prometheus to use Alert Manager. You can also monitor Alert Manager with Prometheus. We will learn how to do both in this section.
To scrape alert manager metrics, we need to add a section in the scrape_configs
sections of prometheus configuration file.
In my case the alert manager and prometheus server is running in the same host, so I will use 127.0.0.1:9093
as the target otherwise substitute 127.0.0.1
with your alertmanager host IP. You can find the host IP using this command:
hostname -I
This is how my configs in prometheus.yml
file looks like after adding alert manager:
...
# A scrape configuration containing exactly one endpoint to scrape:
scrape_configs:
- job_name: 'alertmanager'
static_configs:
- targets: ['127.0.0.1:9093']
...
Also, type in the IP address and port number of Alert Manager in the alerting > alertmanagers
section of the prometheus.yml
file
...
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
- 127.0.0.1:9093
...
For the changes to take effect, restart the prometheus service as follows:
sudo systemctl restart prometheus
Visit the URL http://<server_ip>:9090/targets
from your favorite web browser, and you should see that alertmanager
is in the UP
state. So, Prometheus can access Alert Manager just fine.
Creating a Prometheus Alert Rule:
On Prometheus, you can use the up expression to find the state of the targets added to Prometheus in the graph search section.
The targets that are in the UP state (running and accessible to Prometheus) will have the value 1, and targets that are not in the UP (or DOWN) state (not running or inaccessible to Prometheus) will have the value 0.
If you stop one of the targets – node_exporter
(let’s say).
sudo systemctl stop node_exporter
The UP
value of that target in prometheus should be 0. So, you can use the up == 0 expressions to list only the targets that are not running or inaccessible to Prometheus.
This expression can be used to create a Prometheus Alert and send alerts to Alert Manager when one or more targets are not running or inaccessible to Prometheus.
To create a Prometheus Alert, create a new file rules.yml
in the /opt/prometheus/
directory as follows:
sudo vim /opt/prometheus/rules.yml
Now, type in the following lines in the rules.yml
file.
groups:
- name: Instances
rules:
- alert: InstanceDown
expr: up == 0
for: 1m
# Labels - additional labels to be attached to the alert
labels:
severity: 'critical'
# Prometheus templates apply here in the annotation and label fields of the alert.
annotations:
summary: 'Instance {{ $labels.instance }} down'
description: '{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 1 minute.'
Here, the alert InstanceDown will be fired when targets are not running or inaccessible to Prometheus (that is up == 0) for a minute (1m).
Now, update the Prometheus configuration file /opt/prometheus/prometheus.yml
as follows:
...
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
- "/opt/prometheus/rules.yaml"
...
Another important option of the prometheus.yml
file is evaluation_interval
. Prometheus will check whether any rules matched every evaluation_interval
time. The default is 15s (15 seconds). So, the Alert rules in the rules.yml
file will be checked every 15 seconds.
For the changes to take effect, restart the prometheus service:
sudo systemctl restart prometheus
Now, navigate to the URL http://<server_ip>:9090/rules
from your favorite web browser, and you should see the rule InstanceDown that you’ve just added.
Navigate to the URL http://<server_ip>:9090/alerts
from your favorite web browser, and you should see the state of the alert InstanceDown.
As you’ve stopped node_exporter
earlier, the alert is active, and it is waiting to be sent to the Alert Manager.
After a minute has passed, the alert InstanceDown
should be in the FIRING
state. It means that the alert is sent to the Alert Manager.
Configuring Slack Receiver on Alert Manager
In this section, I will show you how to configure Slack as the Alert Manager receiver so that you can get messages on your Slack account from Alert Manager if a Prometheus target is DOWN.
If you want to receive notifications via Slack, you should be part of a Slack workspace. If you are currently not a part of any Slack workspace, or you want to test this out in separate workspace, you can quickly create one here.
To set up alerting in your Slack workspace, you’re going to need a Slack API URL. Go to Slack -> Administration -> Manage apps.
In the Manage apps directory, search for Incoming WebHooks and add it to your Slack workspace.
Next, specify in which channel you’d like to receive notifications from Alertmanager. (I’ve created #citizix-alerts
channel.) After you confirm and add Incoming WebHooks integration, webhook URL (which is your Slack API URL) is displayed. Copy it.
Then you need to modify the alertmanager.yml
file. Fill out your alertmanager.yml
based on the template below. Use the url that you have just copied as slack_api_url
.
global:
resolve_timeout: 1m
slack_api_url: 'https://hooks.slack.com/services/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'
route:
group_by: ['alertname']
group_wait: 30s
group_interval: 5m
repeat_interval: 1h
receiver: 'slack-notifications'
receivers:
- name: 'slack-notifications'
slack_configs:
- channel: '#citizix-alerts'
send_resolved: true
text: 'Alert group {{ .GroupLabels.alertname }} triggered'
inhibit_rules:
- source_match:
severity: 'critical'
target_match:
severity: 'warning'
equal: ['alertname', 'dev', 'instance']
In the above configs, we have updated the alertmanager receiver to slack-notifications
, the receiver we have created and added configs. It will use that from now on.
repeat_interval
in route configurations is also an important Alert Manager option. By default, repeat_interval
is set to 1h
(1 hour). If Alert Manager has successfully sent you a message on slack, it will wait an hour before sending you another one. If you don’t want to get emails very frequently, you can increase it.
Now, restart the alertmanager
service for the changes to take effect:
sudo systemctl restart alertmanager
You should get an message on slack shortly, as you had stopped node_exporter
earlier, remember?
That is it!
In this article, we have learnt how to install and configure Alertmanager in a Linux server. We have learnt how to configure Alert Manager and Prometheus to send slack notifications when a Prometheus target is DOWN.
4 Comments
Pingback: How to Setup Prometheus Monitoring On Kubernetes Cluster
Pingback: How to run Prometheus with docker and docker-compose
Pingback: How To Monitor Linux Servers Using Prometheus Node Exporter
Pingback: How to Set up Prometheus Node exporter in Kubernetes