How to Monitor Multiple Servers With Grafana, Influxdb & Telegraf

In this guide, we are going to learn how to monitor multiple linux servers with the help of grafana, infuxdb and telegraf.

Telegraf is used to collect the data and send periodically to influx db then grafana will connect and represent data in a visually appealing way.

Telegraf is a very light agent that is in charge of collecting, processing and sending the metrics of a machine that we want to monitor to our database, Influxdb.

InfluxDB is the database in which we will store the metrics sent from the agent. This database is designed to withstand high write and read loads. Here is some more information.

Grafana is a multi-platform open source analytics and interactive visualization web application. It provides charts, graphs, and alerts for the web when connected to supported data sources.

Related content:

Prerequisites

  • Grafana up and running
  • InfluxDB up and running

Create the Database with InfluxDB

we need to create a database to store the server metrics that we want to monitor Inside the machine we access InfluxDB using the following command:

1
influx -host localhost -port 8086

Create a database with to store the data

1
CREATE DATABASE citisrv_data WITH DURATION 60d;
  • DURATION: Time that we want to store the data of the monitored server. In this case it will be 60 days

To show all databases we use:

1
SHOW DATABASES;

We already have the database created, now we can go to configure and install Telegraf.

Telegraf

Telegraf is used to send data to influxdb

Install Telegraf Agent on each Server

Debian based

1
2
3
4
5
### Install Telegraf Agent
sudo apt install telegraf -y

sudo systemctl start telegraf
sudo systemctl enable telegraf

Chceck status to confirm telegraf is running:

1
sudo systemctl status telegraf

For other systems, you can grab the latest release for your server here https://github.com/influxdata/telegraf/releases/

Centos:

1
sudo dnf install -y https://dl.influxdata.com/telegraf/releases/telegraf-1.19.2-1.x86_64.rpm

Start the service

1
sudo systemctl start telegraf

Check status

1
sudo systemctl status telegraf

Then you should see this

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
$ sudo systemctl status telegraf

● telegraf.service - The plugin-driven server agent for reporting metrics into InfluxDB
   Loaded: loaded (/usr/lib/systemd/system/telegraf.service; enabled; vendor preset: disabled)
   Active: active (running) since Tue 2021-08-03 20:39:35 UTC; 3s ago
     Docs: https://github.com/influxdata/telegraf
 Main PID: 545982 (telegraf)
    Tasks: 7 (limit: 23492)
   Memory: 25.6M
   CGroup: /system.slice/telegraf.service
           └─545982 /usr/bin/telegraf -config /etc/telegraf/telegraf.conf -config-directory /etc/telegraf/telegraf.d

Aug 03 20:39:35 cloudsrv.citizix.com systemd[1]: Started The plugin-driven server agent for reporting metrics into InfluxDB.
Aug 03 20:39:35 cloudsrv.citizix.com telegraf[545982]: time="2021-08-03T20:39:35Z" level=error msg="failed to create cache directory. >
Aug 03 20:39:35 cloudsrv.citizix.com telegraf[545982]: time="2021-08-03T20:39:35Z" level=error msg="failed to open. Ignored. open /etc>
Aug 03 20:39:35 cloudsrv.citizix.com telegraf[545982]: 2021-08-03T20:39:35Z I! Starting Telegraf 1.19.2
Aug 03 20:39:35 cloudsrv.citizix.com telegraf[545982]: 2021-08-03T20:39:35Z I! Loaded inputs: cpu disk diskio kernel mem processes swa>
Aug 03 20:39:35 cloudsrv.citizix.com telegraf[545982]: 2021-08-03T20:39:35Z I! Loaded aggregators:
Aug 03 20:39:35 cloudsrv.citizix.com telegraf[545982]: 2021-08-03T20:39:35Z I! Loaded processors:
Aug 03 20:39:35 cloudsrv.citizix.com telegraf[545982]: 2021-08-03T20:39:35Z I! Loaded outputs: influxdb
Aug 03 20:39:35 cloudsrv.citizix.com telegraf[545982]: 2021-08-03T20:39:35Z I! Tags enabled: host=cloudsrv.citizix.com
Aug 03 20:39:35 cloudsrv.citizix.com telegraf[545982]: 2021-08-03T20:39:35Z I! [agent] Config: Interval:10s, Quiet:false, Hostname:"cl>

Config file found here /etc/telegraf/telegraf.conf

1
sudo vim /etc/telegraf/telegraf.conf

configure the time it takes to collect the data and send it: This sets agent time interval tp 30 sec

1
2
3
4
[agent]
  ## Default data collection interval for all inputs
  interval = "30s"
  flush_interval = "30s"

configure the IP of Server1, our InfluxDB server, with its respective port and Add the database that we just created

1
2
3
[[outputs.influxdb]]
  urls = ["http://127.0.0.1:8086"]
  database = "citisrv_data"
1
2
systemctl restart telegraf
systemctl status telegraf

Output

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
[root@cloudsrv ~]# systemctl status telegraf
● telegraf.service - The plugin-driven server agent for reporting metrics into InfluxDB
   Loaded: loaded (/usr/lib/systemd/system/telegraf.service; enabled; vendor preset: disabled)
   Active: active (running) since Tue 2021-08-03 20:47:05 UTC; 15s ago
     Docs: https://github.com/influxdata/telegraf
 Main PID: 546120 (telegraf)
    Tasks: 7 (limit: 23492)
   Memory: 24.6M
   CGroup: /system.slice/telegraf.service
           └─546120 /usr/bin/telegraf -config /etc/telegraf/telegraf.conf -config-directory /etc/telegraf/telegraf.d

Aug 03 20:47:05 cloudsrv.citizix.com systemd[1]: Started The plugin-driven server agent for reporting metrics into InfluxDB.
Aug 03 20:47:05 cloudsrv.citizix.com telegraf[546120]: time="2021-08-03T20:47:05Z" level=error msg="failed to create cache directory. >
Aug 03 20:47:05 cloudsrv.citizix.com telegraf[546120]: time="2021-08-03T20:47:05Z" level=error msg="failed to open. Ignored. open /etc>
Aug 03 20:47:05 cloudsrv.citizix.com telegraf[546120]: 2021-08-03T20:47:05Z I! Starting Telegraf 1.19.2
Aug 03 20:47:05 cloudsrv.citizix.com telegraf[546120]: 2021-08-03T20:47:05Z I! Loaded inputs: cpu disk diskio kernel mem processes swa>
Aug 03 20:47:05 cloudsrv.citizix.com telegraf[546120]: 2021-08-03T20:47:05Z I! Loaded aggregators:
Aug 03 20:47:05 cloudsrv.citizix.com telegraf[546120]: 2021-08-03T20:47:05Z I! Loaded processors:
Aug 03 20:47:05 cloudsrv.citizix.com telegraf[546120]: 2021-08-03T20:47:05Z I! Loaded outputs: influxdb
Aug 03 20:47:05 cloudsrv.citizix.com telegraf[546120]: 2021-08-03T20:47:05Z I! Tags enabled: host=cloudsrv.citizix.com
Aug 03 20:47:05 cloudsrv.citizix.com telegraf[546120]: 2021-08-03T20:47:05Z I! [agent] Config: Interval:10s, Quiet:false, Hostname:"cl>

Go back to Server1 and go back to InfluxDB to check that the data is coming:

1
influx -host localhost -port 8086

Select the Database that we are using to storage the metrics:

1
use citisrv_data

To verify that you are receiving data, we will launch a query with the data that the Telegraf Agent is sending:

1
select Percent_User_Time from win_cpu

As we can see the query shows us the data that is being stored within this database.

comments powered by Disqus
Citizix Ltd
Built with Hugo
Theme Stack designed by Jimmy