In this guide, we are going to learn how to monitor multiple linux servers with the help of grafana, infuxdb and telegraf.
Telegraf is used to collect the data and send periodically to influx db then grafana will connect and represent data in a visually appealing way.
Telegraf is a very light agent that is in charge of collecting, processing and sending the metrics of a machine that we want to monitor to our database, Influxdb.
InfluxDB is the database in which we will store the metrics sent from the agent. This database is designed to withstand high write and read loads. Here is some more information.
Grafana is a multi-platform open source analytics and interactive visualization web application. It provides charts, graphs, and alerts for the web when connected to supported data sources.
Related content:
Prerequisites
- Grafana up and running
- InfluxDB up and running
Create the Database with InfluxDB
we need to create a database to store the server metrics that we want to monitor
Inside the machine we access InfluxDB using the following command:
| 1
 | influx -host localhost -port 8086
 | 
Create a database with to store the data
| 1
 | CREATE DATABASE citisrv_data WITH DURATION 60d;
 | 
- DURATION: Time that we want to store the data of the monitored server. In this case it will be 60 days
To show all databases we use:
We already have the database created, now we can go to configure and install Telegraf.
Telegraf
Telegraf is used to send data to influxdb
Install Telegraf Agent on each Server
Debian based
| 1
2
3
4
5
 | ### Install Telegraf Agent
sudo apt install telegraf -y
sudo systemctl start telegraf
sudo systemctl enable telegraf
 | 
Chceck status to confirm telegraf is running:
| 1
 | sudo systemctl status telegraf
 | 
For other systems, you can grab the latest release for your server here https://github.com/influxdata/telegraf/releases/
Centos:
| 1
 | sudo dnf install -y https://dl.influxdata.com/telegraf/releases/telegraf-1.19.2-1.x86_64.rpm
 | 
Start the service
| 1
 | sudo systemctl start telegraf
 | 
Check status
| 1
 | sudo systemctl status telegraf
 | 
Then you should see this
|  1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
 | $ sudo systemctl status telegraf
● telegraf.service - The plugin-driven server agent for reporting metrics into InfluxDB
   Loaded: loaded (/usr/lib/systemd/system/telegraf.service; enabled; vendor preset: disabled)
   Active: active (running) since Tue 2021-08-03 20:39:35 UTC; 3s ago
     Docs: https://github.com/influxdata/telegraf
 Main PID: 545982 (telegraf)
    Tasks: 7 (limit: 23492)
   Memory: 25.6M
   CGroup: /system.slice/telegraf.service
           └─545982 /usr/bin/telegraf -config /etc/telegraf/telegraf.conf -config-directory /etc/telegraf/telegraf.d
Aug 03 20:39:35 cloudsrv.citizix.com systemd[1]: Started The plugin-driven server agent for reporting metrics into InfluxDB.
Aug 03 20:39:35 cloudsrv.citizix.com telegraf[545982]: time="2021-08-03T20:39:35Z" level=error msg="failed to create cache directory. >
Aug 03 20:39:35 cloudsrv.citizix.com telegraf[545982]: time="2021-08-03T20:39:35Z" level=error msg="failed to open. Ignored. open /etc>
Aug 03 20:39:35 cloudsrv.citizix.com telegraf[545982]: 2021-08-03T20:39:35Z I! Starting Telegraf 1.19.2
Aug 03 20:39:35 cloudsrv.citizix.com telegraf[545982]: 2021-08-03T20:39:35Z I! Loaded inputs: cpu disk diskio kernel mem processes swa>
Aug 03 20:39:35 cloudsrv.citizix.com telegraf[545982]: 2021-08-03T20:39:35Z I! Loaded aggregators:
Aug 03 20:39:35 cloudsrv.citizix.com telegraf[545982]: 2021-08-03T20:39:35Z I! Loaded processors:
Aug 03 20:39:35 cloudsrv.citizix.com telegraf[545982]: 2021-08-03T20:39:35Z I! Loaded outputs: influxdb
Aug 03 20:39:35 cloudsrv.citizix.com telegraf[545982]: 2021-08-03T20:39:35Z I! Tags enabled: host=cloudsrv.citizix.com
Aug 03 20:39:35 cloudsrv.citizix.com telegraf[545982]: 2021-08-03T20:39:35Z I! [agent] Config: Interval:10s, Quiet:false, Hostname:"cl>
 | 
Config file found here /etc/telegraf/telegraf.conf
| 1
 | sudo vim /etc/telegraf/telegraf.conf
 | 
configure the time it takes to collect the data and send it:
This sets agent time interval tp 30 sec
| 1
2
3
4
 | [agent]
  ## Default data collection interval for all inputs
  interval = "30s"
  flush_interval = "30s"
 | 
configure the IP of Server1, our InfluxDB server, with its respective port and Add the database that we just created
| 1
2
3
 | [[outputs.influxdb]]
  urls = ["http://127.0.0.1:8086"]
  database = "citisrv_data"
 | 
| 1
2
 | systemctl restart telegraf
systemctl status telegraf
 | 
Output
|  1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
 | [root@cloudsrv ~]# systemctl status telegraf
● telegraf.service - The plugin-driven server agent for reporting metrics into InfluxDB
   Loaded: loaded (/usr/lib/systemd/system/telegraf.service; enabled; vendor preset: disabled)
   Active: active (running) since Tue 2021-08-03 20:47:05 UTC; 15s ago
     Docs: https://github.com/influxdata/telegraf
 Main PID: 546120 (telegraf)
    Tasks: 7 (limit: 23492)
   Memory: 24.6M
   CGroup: /system.slice/telegraf.service
           └─546120 /usr/bin/telegraf -config /etc/telegraf/telegraf.conf -config-directory /etc/telegraf/telegraf.d
Aug 03 20:47:05 cloudsrv.citizix.com systemd[1]: Started The plugin-driven server agent for reporting metrics into InfluxDB.
Aug 03 20:47:05 cloudsrv.citizix.com telegraf[546120]: time="2021-08-03T20:47:05Z" level=error msg="failed to create cache directory. >
Aug 03 20:47:05 cloudsrv.citizix.com telegraf[546120]: time="2021-08-03T20:47:05Z" level=error msg="failed to open. Ignored. open /etc>
Aug 03 20:47:05 cloudsrv.citizix.com telegraf[546120]: 2021-08-03T20:47:05Z I! Starting Telegraf 1.19.2
Aug 03 20:47:05 cloudsrv.citizix.com telegraf[546120]: 2021-08-03T20:47:05Z I! Loaded inputs: cpu disk diskio kernel mem processes swa>
Aug 03 20:47:05 cloudsrv.citizix.com telegraf[546120]: 2021-08-03T20:47:05Z I! Loaded aggregators:
Aug 03 20:47:05 cloudsrv.citizix.com telegraf[546120]: 2021-08-03T20:47:05Z I! Loaded processors:
Aug 03 20:47:05 cloudsrv.citizix.com telegraf[546120]: 2021-08-03T20:47:05Z I! Loaded outputs: influxdb
Aug 03 20:47:05 cloudsrv.citizix.com telegraf[546120]: 2021-08-03T20:47:05Z I! Tags enabled: host=cloudsrv.citizix.com
Aug 03 20:47:05 cloudsrv.citizix.com telegraf[546120]: 2021-08-03T20:47:05Z I! [agent] Config: Interval:10s, Quiet:false, Hostname:"cl>
 | 
Go back to Server1 and go back to InfluxDB to check that the data is coming:
| 1
 | influx -host localhost -port 8086
 | 
Select the Database that we are using to storage the metrics:
To verify that you are receiving data, we will launch a query with the data that the Telegraf Agent is sending:
| 1
 | select Percent_User_Time from win_cpu
 | 
As we can see the query shows us the data that is being stored within this database.