How to Setup Prometheus Monitoring On Kubernetes Cluster

In this guide, we will learn how to set up Prometheus for monitoring on a Kubernetes cluster. This setup collects node, pods, and services metrics automatically using Prometheus service discovery configurations.

# About Prometheus

Prometheus is a free open source software application used for event monitoring and alerting. It was originally built at SoundCloud. It is now a standalone open source project and maintained independently of any company.

Prometheus collects and stores its metrics as time series data, i.e. metrics information is stored with the timestamp at which it was recorded, alongside optional key-value pairs called labels. Metrics are numeric measurements, time series mean that changes are recorded over time. What users want to measure differs from application to application. For a web server it might be request times, for a database it might be number of active connections or number of active queries etc.

Metric Collection: Prometheus uses the pull model to retrieve metrics over HTTP. There is an option to push metrics to Prometheus using Pushgateway for use cases where Prometheus cannot Scrape the metrics. One such example is collecting custom metrics from short-lived kubernetes jobs & Cronjobs
Metric Endpoint: The systems that you want to monitor using Prometheus should expose the metrics on an /metrics endpoint. Prometheus uses this endpoint to pull the metrics in regular intervals.
PromQL: Prometheus comes with PromQL, a very flexible query language that can be used to query the metrics in the Prometheus dashboard. Also, the PromQL query will be used by Prometheus UI and Grafana to visualize metrics.
Prometheus Exporters: Exporters are libraries which converts existing metric from third-party apps to Prometheus metrics format. There are many official and community Prometheus exporters . One example is, Prometheus node exporter. It exposes all Linux system-level metrics in Prometheus format.
TSDB (time-series database): Prometheus uses TSDB for storing all the data efficiently. By default, all the data gets stored locally. However, to avoid single point of failure, there are options to integrate remote storage for Prometheus TSDB.

If you would like to run prometheus on your local machine checkout How to run Prometheus with docker and docker-compose, otherwise checkout How To Install and Configure Prometheus On a Linux Server if you are running prometheus on a linux server.

Prometheus is often used in conjunction with Alert manager to set up alerts and Grafana to graph metrics collected.

# Prometheus Monitoring Setup on Kubernetes

I assume that you have a kubernetes cluster up and running with kubectl setup on your workstation. If not please checkout these guides:

Latest Prometheus is available as a docker image in its official docker hub account. We will use that image for the setup.

# Prometheus Kubernetes Manifest Files

We can finally create manifests for our set up

# Create a Namespace

First, we will create a Kubernetes namespace for all our monitoring components. If you don’t create a dedicated namespace, all the Prometheus kubernetes deployment objects get deployed on the default namespace.

Save the following in namespace.yaml:

---
apiVersion: v1
kind: Namespace
metadata:
  name: prometheus

Then execute the following command to create a new namespace named prometheus.

kubectl apply -f namespace.yaml

# Create a ClusterRole

Prometheus uses Kubernetes APIs to read all the available metrics from Nodes, Pods, Deployments, etc. For this reason, we need to create an RBAC policy with read access to required API groups and bind the policy to the prometheus namespace.

First, create a file named clusterRole.yaml and copy the following RBAC role.

In the role, given below, you can see that we have added get, list, and watch permissions to nodes, services endpoints, pods, and ingresses. The role binding is bound to the monitoring namespace. If you have any use case to retrieve metrics from any other object, you need to add that in this cluster role.

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: prometheus
rules:
- apiGroups:
  - ""
  resources:
  - nodes
  - nodes/proxy
  - services
  - endpoints
  - pods
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - extensions
  resources:
  - ingresses
  verbs:
  - get
  - list
  - watch
- nonResourceURLs:
  - /metrics
  verbs:
  - get
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: prometheus
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: prometheus
subjects:
- kind: ServiceAccount
  name: default
  namespace: prometheus

Next, create the role using the following command.

kubectl create -f clusterRole.yaml

# Create a Config Map To Externalize Prometheus Configurations

All configurations for Prometheus are part of prometheus.yaml file and all the alert rules for Alertmanager are configured in prometheus.rules.

prometheus.yaml: This is the main Prometheus configuration which holds all the scrape configs, service discovery details, storage locations, data retention configs, etc)
prometheus.rules: This file contains all the Prometheus alerting rules

By externalizing Prometheus configs to a Kubernetes config map, you don’t have to build the Prometheus image whenever you need to add or remove a configuration. You need to update the config map and restart the Prometheus pods to apply the new configuration.

The config map with all the Prometheus scrape config and alerting rules gets mounted to the Prometheus container in /etc/prometheus location as prometheus.yaml and prometheus.rules files.

Create a file called config-map.yaml and add the following file contents:

apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus
  labels:
    name: prometheus
data:
  prometheus.rules: |-
    groups:
    - name: citizix demo alert
      rules:
      - alert: High Pod Memory
        expr: sum(container_memory_usage_bytes) > 1
        for: 1m
        labels:
          severity: slack
        annotations:
          summary: High Memory Usage
  prometheus.yml: |-
    global:
      scrape_interval: 5s
      evaluation_interval: 5s
    rule_files:
      - /etc/prometheus/prometheus.rules
    alerting:
      alertmanagers:
      - scheme: http
        static_configs:
        - targets:
          - "alertmanager.monitoring.svc:9093"
    scrape_configs:
      - job_name: 'node-exporter'
        kubernetes_sd_configs:
          - role: endpoints
        relabel_configs:
        - source_labels: [__meta_kubernetes_endpoints_name]
          regex: 'node-exporter'
          action: keep

      - job_name: 'kubernetes-apiservers'
        kubernetes_sd_configs:
        - role: endpoints
        scheme: https
        tls_config:
          ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
        relabel_configs:
        - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
          action: keep
          regex: default;kubernetes;https
      - job_name: 'kubernetes-nodes'
        scheme: https
        tls_config:
          ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
        kubernetes_sd_configs:
        - role: node
        relabel_configs:
        - action: labelmap
          regex: __meta_kubernetes_node_label_(.+)
        - target_label: __address__
          replacement: kubernetes.default.svc:443
        - source_labels: [__meta_kubernetes_node_name]
          regex: (.+)
          target_label: __metrics_path__
          replacement: /api/v1/nodes/${1}/proxy/metrics

      - job_name: 'kubernetes-pods'
        kubernetes_sd_configs:
        - role: pod
        relabel_configs:
        - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
          action: keep
          regex: true
        - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
          action: replace
          target_label: __metrics_path__
          regex: (.+)
        - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
          action: replace
          regex: ([^:]+)(?::\d+)?;(\d+)
          replacement: $1:$2
          target_label: __address__
        - action: labelmap
          regex: __meta_kubernetes_pod_label_(.+)
        - source_labels: [__meta_kubernetes_namespace]
          action: replace
          target_label: kubernetes_namespace
        - source_labels: [__meta_kubernetes_pod_name]
          action: replace
          target_label: kubernetes_pod_name

      - job_name: 'kube-state-metrics'
        static_configs:
          - targets: ['kube-state-metrics.kube-system.svc.cluster.local:8080']
      - job_name: 'kubernetes-cadvisor'
        scheme: https
        tls_config:
          ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
        kubernetes_sd_configs:
        - role: node
        relabel_configs:
        - action: labelmap
          regex: __meta_kubernetes_node_label_(.+)
        - target_label: __address__
          replacement: kubernetes.default.svc:443
        - source_labels: [__meta_kubernetes_node_name]
          regex: (.+)
          target_label: __metrics_path__
          replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor

      - job_name: 'kubernetes-service-endpoints'
        kubernetes_sd_configs:
        - role: endpoints
        relabel_configs:
        - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
          action: keep
          regex: true
        - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
          action: replace
          target_label: __scheme__
          regex: (https?)
        - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
          action: replace
          target_label: __metrics_path__
          regex: (.+)
        - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
          action: replace
          target_label: __address__
          regex: ([^:]+)(?::\d+)?;(\d+)
          replacement: $1:$2
        - action: labelmap
          regex: __meta_kubernetes_service_label_(.+)
        - source_labels: [__meta_kubernetes_namespace]
          action: replace
          target_label: kubernetes_namespace
        - source_labels: [__meta_kubernetes_service_name]
          action: replace
          target_label: kubernetes_name

Execute the following command to create the config map in Kubernetes.

kubectl create -f config-map.yaml

It creates two files inside the container.

Note: In Prometheus terms, the config for collecting metrics from a collection of endpoints is called a job.

The prometheus.yaml contains all the configurations to discover pods and services running in the Kubernetes cluster dynamically. We have the following scrape jobs in our Prometheus scrape configuration.

kubernetes-apiservers: It gets all the metrics from the API servers.
kubernetes-nodes: It collects all the kubernetes node metrics.
kubernetes-pods: All the pod metrics get discovered if the pod metadata is annotated with prometheus.io/scrape and prometheus.io/port annotations.
kubernetes-cadvisor: Collects all cAdvisor metrics.
kubernetes-service-endpoints: All the Service endpoints are scrapped if the service metadata is annotated with prometheus.io/scrape and prometheus.io/port annotations. It can be used for black-box monitoring.

prometheus.rules contains all the alert rules for sending alerts to the Alertmanager.

# Create a Prometheus Deployment

Next we create a prometheus deployment. We are using the official prometheus image from docker hub. We are also not using any persistent storage volumes for this basic set up. Please consider a persistent storage when setting up prometheus for production use cases.

In this configuration, we are mounting the Prometheus config map as a file inside /etc/prometheus as explained in the previous section. Save the following content to deployment.yaml.

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: prometheus
  name: prometheus
  namespace: prometheus
spec:
  replicas: 1
  selector:
    matchLabels:
      app: prometheus
  template:
    metadata:
      labels:
        app: prometheus
    spec:
      containers:
      - args:
        - --storage.tsdb.retention.time=24h
        - --config.file=/etc/prometheus/prometheus.yml
        - --storage.tsdb.path=/prometheus/
        image: prom/prometheus:v2.37.0
        name: prometheus
        ports:
        - containerPort: 9090
          name: http
          protocol: TCP
        resources:
          limits:
            cpu: 1
            memory: 1Gi
          requests:
            cpu: 500m
            memory: 500M
        volumeMounts:
        - mountPath: /etc/prometheus/
          name: prometheus-config-volume
        - mountPath: /prometheus/
          name: prometheus-storage-volume
      volumes:
      - configMap:
          defaultMode: 420
          name: prometheus
        name: prometheus-config-volume
      - emptyDir: {}
        name: prometheus-storage-volume

Create a deployment on monitoring namespace using the above file.

kubectl create -f deployment.yaml

You can check the created deployment using the following command.

kubectl get deployments --namespace=prometheus
kubectl get pods --namespace=prometheus

# Connecting To Prometheus Dashboard

You can view the deployed Prometheus dashboard in three different ways.

Using Kubectl port forwarding
Exposing the Prometheus deployment as a service with NodePort or a Load Balancer.
Adding an Ingress object if you have an Ingress controller deployed.

# Using Kubectl port forwarding

Using kubectl port forwarding, you can access a pod from your local workstation using a selected port on your localhost. This method is primarily used for debugging purposes.

First, get the Prometheus pod name.

kubectl get pods --namespace=prometheus

The output will look like the following.

➜ kubectl get pods --namespace=prometheus

NAME                          READY   STATUS    RESTARTS   AGE
prometheus-5bccbcfc94-rbd9g   1/1     Running   0          38m

Execute the following command with your pod name to access Prometheus from localhost port 8080.

Note: Replace prometheus-5bccbcfc94-rbd9g with your pod name.

kubectl port-forward prometheus-5bccbcfc94-rbd9g 8080:9090 -n prometheus

Now, if you access http://localhost:8080 on your browser, you will get the Prometheus home page.

# Exposing Prometheus as a Service [NodePort & LoadBalancer]

To access the Prometheus dashboard over a IP or a DNS name, you need to expose it as a Kubernetes service.

Create a file namedservice.yaml and copy the following contents. We will expose Prometheus on all kubernetes node IP’s on port 30000.

Note: If you are on AWS, Azure, or Google Cloud, You can use Loadbalancer type, which will create a load balancer and automatically points it to the Kubernetes service endpoint.

---
apiVersion: v1
kind: Service
metadata:
  name: prometheus
  annotations:
      prometheus.io/scrape: 'true'
      prometheus.io/port:   '9090'
  labels:
    app: prometheus
    namespace: prometheus
spec:
  selector:
    app: prometheus
  type: NodePort
  ports:
    - name: prometheus
      protocol: TCP
      port: 9090
      targetPort: http
      nodePort: 30000

The annotations in the above service YAML makes sure that the service endpoint is scrapped by Prometheus. The prometheus.io/port should always be the target port mentioned in service YAML

Create the service using the following command.

kubectl create -f service.yaml --namespace=prometheus

Once created, you can access the Prometheus dashboard using any of the Kubernetes nodes IP on port 30000. If you are on the cloud, make sure you have the right firewall rules to access port 30000 from your workstation.

Now if you browse to status --> Targets, you will see all the Kubernetes endpoints connected to Prometheus automatically using service discovery.

You can head over to the homepage and select the metrics you need from the drop-down and get the graph for the time range you mention. An example graph for container_cpu_usage_seconds_total.

# Exposing Prometheus Using Ingress

If you have an existing ingress controller setup, you can create an ingress object to route the Prometheus DNS to the Prometheus backend service.

Also, you can add SSL for Prometheus in the ingress layer. You can refer to the Kubernetes ingress TLS/SSL Certificate guide for more details.

Here is a sample ingress object.

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
    kubernetes.io/ingress.class: nginx
  labels:
    app.kubernetes.io/instance: prometheus
    app.kubernetes.io/name: prometheus
  name: prometheus
  namespace: prometheus
spec:
  rules:
  - host: prometheus.citizix.com
    http:
      paths:
      - backend:
          service:
            name: prometheus
            port:
              number: 9090
        path: /
        pathType: ImplementationSpecific

# Setting Up Kube State Metrics

Kube state metrics service will provide many metrics which is not available by default. Please make sure you deploy Kube state metrics to monitor all your kubernetes API objects like deployments, pods, jobs, cronjobs etc.

# Setting Up Alertmanager

Alertmanager handles all the alerting mechanisms for Prometheus metrics. There are many integrations available to receive alerts from the Alertmanager (Slack, email, API endpoints, etc)

# Setting Up Grafana

Using Grafana you can create dashboards from Prometheus metrics to monitor the kubernetes cluster.

The best part is, you don’t have to write all the PromQL queries for the dashboards. There are many community dashboard templates available for Kubernetes. You can import it and modify it as per your needs.

# Setting Up Node Exporter

Node Exporter will provide all the Linux system-level metrics of all Kubernetes nodes.

The scrape config for node-exporter is part of the Prometheus config map. Once you deploy the node-exporter, you should see node-exporter targets and metrics in Prometheus.

# Prometheus Production Setup Considerations

For the production Prometheus setup, there are more configurations and parameters that need to be considered for scaling, high availability, and storage. It all depends on your environment and data volume.

For example, Prometheus Operator project makes it easy to automate Prometheus setup and its configurations.

Also, the CNCF project Thanos helps you aggregate metrics from multiple Kubernetes Prometheus sources and have a highly available setup with scalable storage.

# Conclusion

In this article, we learnt how to set up Prometheus on Kubernetes.