How to Setup Prometheus Monitoring On Kubernetes Cluster

Step-by-step guide to setting up Prometheus on Kubernetes. Deploy Prometheus with RBAC, ConfigMap, and service discovery for nodes, pods, and services. Expose the UI via port-forward, NodePort, or Ingress.

In this guide you will learn how to set up Prometheus for monitoring on a Kubernetes cluster. The setup uses Kubernetes service discovery so Prometheus automatically finds and scrapes nodes, pods, and services. You will create a namespace, RBAC, a ConfigMap with scrape configs and alert rules, a Deployment, and a Service (or Ingress) to access the Prometheus UI.

By the end you will have Prometheus running in the cluster and scraping Kubernetes API servers, nodes (kubelet/cAdvisor), and optionally pods and services that expose a /metrics endpoint. For a production-ready stack with authentication, persistence, and high availability, see the Production-Ready Prometheus on Kubernetes guide.

About Prometheus

Prometheus is an open source systems monitoring and alerting toolkit. It was originally built at SoundCloud and is now a standalone project. Prometheus stores metrics as time series data: each sample has a timestamp and optional labels (e.g. job, instance), so you can query and graph metrics over time.

  • Pull model: Prometheus scrapes metrics over HTTP from targets you configure. For short-lived jobs (e.g. Kubernetes Jobs/CronJobs), you can use Pushgateway to push metrics.
  • Metrics endpoint: Targets expose metrics on a /metrics endpoint in Prometheus text format. Prometheus scrapes at a configurable interval (e.g. 15s).
  • PromQL: Prometheus provides PromQL for querying and aggregating metrics. The Prometheus UI and Grafana use PromQL for dashboards.
  • Exporters: Exporters convert third-party metrics into Prometheus format (e.g. Node Exporter for host metrics).
  • TSDB: Prometheus stores data in a local time-series database. For long-term or highly available storage, you can integrate remote write.

For running Prometheus locally, see How to Run Prometheus with Docker and Docker Compose. For a Linux server install, see How to Install and Configure Prometheus on a Linux Server. Prometheus is often used with Alertmanager for alerting and Grafana for dashboards.

Prerequisites

  • A Kubernetes cluster (1.20+) with kubectl configured on your workstation.
  • Optional: Node Exporter and kube-state-metrics deployed if you want node and API-object metrics (the scrape configs in this guide expect them; see the optional sections below).

If you need to create a cluster, see for example:

Prometheus is available as a Docker image; we use that image in the Deployment below.

Create a Namespace

Create a dedicated namespace for monitoring resources:

namespace.yaml:

1
2
3
4
apiVersion: v1
kind: Namespace
metadata:
  name: prometheus
1
kubectl apply -f namespace.yaml

Create RBAC (ClusterRole and ClusterRoleBinding)

Prometheus needs read access to the Kubernetes API to discover and scrape nodes, pods, and services. Create a ClusterRole with get, list, and watch on the required resources and a ClusterRoleBinding that grants it to the default ServiceAccount in the prometheus namespace.

The role below includes nodes, nodes/proxy, services, endpoints, pods, and ingresses (both legacy extensions and networking.k8s.io). Add other resources if you need them.

clusterRole.yaml:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: prometheus
rules:
  - apiGroups: [""]
    resources:
      - nodes
      - nodes/proxy
      - services
      - endpoints
      - pods
    verbs: ["get", "list", "watch"]
  - apiGroups: ["extensions", "networking.k8s.io"]
    resources: ["ingresses"]
    verbs: ["get", "list", "watch"]
  - nonResourceURLs: ["/metrics"]
    verbs: ["get"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: prometheus
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: prometheus
subjects:
  - kind: ServiceAccount
    name: default
    namespace: prometheus
1
kubectl apply -f clusterRole.yaml

Create ConfigMap (Prometheus Config and Rules)

Store the Prometheus configuration and alert rules in a ConfigMap. The container will mount it at /etc/prometheus. When you change the config, update the ConfigMap and restart the Prometheus pods.

The example config includes:

  • alerting: Optional Alertmanager (same namespace). If you do not deploy Alertmanager, Prometheus will log connection errors but still run. Use alertmanager.prometheus.svc:9093 when Alertmanager is in the prometheus namespace.
  • rule_files: Path to the rules file inside the container.
  • scrape_configs: Jobs for node-exporter, Kubernetes API servers, nodes, pods, cAdvisor, service endpoints, and kube-state-metrics. Some jobs require Node Exporter and kube-state-metrics to be deployed; otherwise those targets will be down until you add them.

config-map.yaml:

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus
  namespace: prometheus
  labels:
    app: prometheus
data:
  prometheus.rules: |-
    groups:
      - name: demo
        rules:
          - alert: HighPodMemory
            expr: sum(container_memory_usage_bytes) > 1
            for: 1m
            labels:
              severity: warning
            annotations:
              summary: High memory usage (demo rule)
  prometheus.yml: |-
    global:
      scrape_interval: 15s
      evaluation_interval: 15s
    rule_files:
      - /etc/prometheus/prometheus.rules
    alerting:
      alertmanagers:
        - scheme: http
          static_configs:
            - targets: ["alertmanager.prometheus.svc:9093"]
    scrape_configs:
      - job_name: node-exporter
        kubernetes_sd_configs:
          - role: endpoints
        relabel_configs:
          - source_labels: [__meta_kubernetes_endpoints_name]
            regex: node-exporter
            action: keep

      - job_name: kubernetes-apiservers
        kubernetes_sd_configs:
          - role: endpoints
        scheme: https
        tls_config:
          ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
        relabel_configs:
          - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
            action: keep
            regex: default;kubernetes;https

      - job_name: kubernetes-nodes
        scheme: https
        tls_config:
          ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
        kubernetes_sd_configs:
          - role: node
        relabel_configs:
          - action: labelmap
            regex: __meta_kubernetes_node_label_(.+)
          - target_label: __address__
            replacement: kubernetes.default.svc:443
          - source_labels: [__meta_kubernetes_node_name]
            regex: (.+)
            target_label: __metrics_path__
            replacement: /api/v1/nodes/${1}/proxy/metrics

      - job_name: kubernetes-pods
        kubernetes_sd_configs:
          - role: pod
        relabel_configs:
          - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
            action: keep
            regex: "true"
          - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
            action: replace
            target_label: __metrics_path__
            regex: (.+)
          - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
            action: replace
            regex: ([^:]+)(?::\d+)?;(\d+)
            replacement: $1:$2
            target_label: __address__
          - action: labelmap
            regex: __meta_kubernetes_pod_label_(.+)
          - source_labels: [__meta_kubernetes_namespace]
            action: replace
            target_label: kubernetes_namespace
          - source_labels: [__meta_kubernetes_pod_name]
            action: replace
            target_label: kubernetes_pod_name

      - job_name: kube-state-metrics
        static_configs:
          - targets: ["kube-state-metrics.kube-system.svc.cluster.local:8080"]

      - job_name: kubernetes-cadvisor
        scheme: https
        tls_config:
          ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
        kubernetes_sd_configs:
          - role: node
        relabel_configs:
          - action: labelmap
            regex: __meta_kubernetes_node_label_(.+)
          - target_label: __address__
            replacement: kubernetes.default.svc:443
          - source_labels: [__meta_kubernetes_node_name]
            regex: (.+)
            target_label: __metrics_path__
            replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor

      - job_name: kubernetes-service-endpoints
        kubernetes_sd_configs:
          - role: endpoints
        relabel_configs:
          - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
            action: keep
            regex: "true"
          - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
            action: replace
            target_label: __scheme__
            regex: (https?)
          - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
            action: replace
            target_label: __metrics_path__
            regex: (.+)
          - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
            action: replace
            target_label: __address__
            regex: ([^:]+)(?::\d+)?;(\d+)
            replacement: $1:$2
          - action: labelmap
            regex: __meta_kubernetes_service_label_(.+)
          - source_labels: [__meta_kubernetes_namespace]
            action: replace
            target_label: kubernetes_namespace
          - source_labels: [__meta_kubernetes_service_name]
            action: replace
            target_label: kubernetes_name
1
kubectl apply -f config-map.yaml

Scrape jobs in this config:

JobDescription
node-exporterEndpoints named node-exporter (deploy Node Exporter DaemonSet for host).
kubernetes-apiserversAPI server metrics.
kubernetes-nodesKubelet metrics per node.
kubernetes-podsPods with annotation prometheus.io/scrape=true (and optional port/path).
kube-state-metricskube-state-metrics in kube-system (deploy for deployment/pod/job).
kubernetes-cadvisorContainer metrics (CPU, memory) per node.
kubernetes-service-endpointsServices with annotation prometheus.io/scrape=true.

Create the Prometheus Deployment

The Deployment uses the official Prometheus image, mounts the ConfigMap at /etc/prometheus, and stores TSDB data in an emptyDir volume. For production, use a PersistentVolumeClaim so data survives pod restarts; see the Production-Ready Prometheus guide.

deployment.yaml:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
apiVersion: apps/v1
kind: Deployment
metadata:
  name: prometheus
  namespace: prometheus
  labels:
    app: prometheus
spec:
  replicas: 1
  selector:
    matchLabels:
      app: prometheus
  template:
    metadata:
      labels:
        app: prometheus
    spec:
      containers:
        - name: prometheus
          image: prom/prometheus:v2.37.0
          args:
            - --config.file=/etc/prometheus/prometheus.yml
            - --storage.tsdb.path=/prometheus/
            - --storage.tsdb.retention.time=24h
            - --web.enable-lifecycle
          ports:
            - name: http
              containerPort: 9090
              protocol: TCP
          resources:
            requests:
              cpu: 500m
              memory: 500Mi
            limits:
              cpu: "1"
              memory: 1Gi
          volumeMounts:
            - name: prometheus-config
              mountPath: /etc/prometheus/
            - name: prometheus-storage
              mountPath: /prometheus/
      volumes:
        - name: prometheus-config
          configMap:
            name: prometheus
        - name: prometheus-storage
          emptyDir: {}
1
kubectl apply -f deployment.yaml

Check that the pod is running:

1
2
kubectl get deployments -n prometheus
kubectl get pods -n prometheus

Accessing the Prometheus UI

You can access the Prometheus UI in three ways: port-forward (quick test), NodePort/LoadBalancer Service, or Ingress.

Option 1: Port-forward (quick test)

From your workstation:

1
kubectl port-forward deployment/prometheus 8080:9090 -n prometheus

Open http://localhost:8080 in your browser. Replace the pod name with deployment/prometheus so the forward works even after the pod is recreated.

Option 2: NodePort or LoadBalancer Service

Create a Service so Prometheus is reachable on a node IP (NodePort) or via a cloud load balancer (LoadBalancer).

service.yaml:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
apiVersion: v1
kind: Service
metadata:
  name: prometheus
  namespace: prometheus
  labels:
    app: prometheus
  annotations:
    prometheus.io/scrape: "true"
    prometheus.io/port: "9090"
spec:
  type: NodePort
  selector:
    app: prometheus
  ports:
    - name: http
      port: 9090
      targetPort: http
      nodePort: 30000
1
kubectl apply -f service.yaml

Access Prometheus at http://<node-ip>:30000. On AWS, Azure, or GCP you can use type: LoadBalancer so the cloud provider creates a load balancer; then use the assigned external IP/hostname.

The Service annotations (prometheus.io/scrape, prometheus.io/port) allow the kubernetes-service-endpoints job to scrape this service if you want Prometheus to scrape itself via the Service.

Option 3: Ingress

If you have an Ingress controller (e.g. NGINX, Traefik), create an Ingress to expose Prometheus with a hostname and optional TLS:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: prometheus
  namespace: prometheus
  labels:
    app: prometheus
spec:
  ingressClassName: nginx
  rules:
    - host: prometheus.example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: prometheus
                port:
                  number: 9090

Replace prometheus.example.com and ingressClassName with your values. For TLS, add a tls section and use cert-manager or your ingress controller’s TLS configuration. See Kubernetes TLS Security Hardening for TLS and security options.

Verifying the Setup

  1. Open the Prometheus UI (port-forward, NodePort, or Ingress).
  2. Go to Status → Targets. You should see targets for kubernetes-apiservers, kubernetes-nodes, kubernetes-cadvisor, and kubernetes-service-endpoints. Some may be DOWN until you deploy Node Exporter and kube-state-metrics (see below).
  3. Go to Graph, enter up and run the query. You should see series with job labels for each scrape job.

Optional: Node Exporter

Node Exporter exposes host metrics (CPU, memory, disk, network) from each node. Deploy it as a DaemonSet so it runs on every node; the node-exporter scrape job in the ConfigMap discovers endpoints named node-exporter. For a step-by-step, see How to Set up Prometheus Node Exporter in Kubernetes. After deployment, the node-exporter targets in Status → Targets should turn UP.

Optional: Kube State Metrics

kube-state-metrics exposes metrics about Kubernetes API objects (deployments, pods, jobs, CronJobs, etc.). The ConfigMap already has a job that scrapes kube-state-metrics.kube-system.svc.cluster.local:8080. Deploy kube-state-metrics in the kube-system namespace (or adjust the target in the ConfigMap), then the kube-state-metrics job will show UP.

Optional: Alertmanager

Alertmanager receives alerts from Prometheus and routes them to Slack, email, PagerDuty, etc. The ConfigMap points to alertmanager.prometheus.svc:9093. If you do not deploy Alertmanager, Prometheus will log connection errors but continue running. Deploy Alertmanager in the prometheus namespace and ensure a Service named alertmanager listens on port 9093. For configuring Alertmanager on Linux, see How to Install and Configure Prometheus Alertmanager in Linux.

Optional: Grafana

Grafana can use Prometheus as a data source to build dashboards. Add Prometheus’s URL (e.g. http://prometheus.prometheus.svc:9090 from inside the cluster, or your NodePort/LoadBalancer/Ingress URL from outside). Import community dashboards for Kubernetes (e.g. from Grafana.com).

Production Considerations

This setup uses an emptyDir volume for Prometheus data, so metrics are lost when the pod is recreated. For production:

  • Use a PersistentVolumeClaim for /prometheus and consider retention and disk size.
  • Restrict access to the Prometheus UI (e.g. Ingress with auth, network policies). See Production-Ready Prometheus on Kubernetes for authentication and TLS.
  • Consider Prometheus Operator or kube-prometheus-stack (Helm) to manage Prometheus, Alertmanager, and Grafana.
  • For long-term or highly available storage, consider Thanos or remote write to a compatible backend.

Troubleshooting

  • Targets DOWN: Check that the corresponding component is deployed (Node Exporter, kube-state-metrics) and that the Service names and namespaces match the scrape config. For API/node/cAdvisor jobs, ensure RBAC is applied and the ServiceAccount has the ClusterRoleBinding.
  • Prometheus pod CrashLoopBackOff: Check logs with kubectl logs -n prometheus deployment/prometheus. Often the cause is invalid YAML in the ConfigMap (e.g. indentation or invalid relabel regex). Validate prometheus.yml locally with promtool check config.
  • No data in Graph: Ensure at least one scrape job has UP targets and wait for the scrape interval (e.g. 15s). Check Status → Configuration to confirm the loaded config.

Summary

You have set up Prometheus on Kubernetes with:

  • A namespace, RBAC (ClusterRole/ClusterRoleBinding), and a ConfigMap with scrape configs and alert rules.
  • A Deployment that mounts the ConfigMap and uses an emptyDir for storage (replace with PVC for production).
  • Access to the UI via port-forward, NodePort/LoadBalancer Service, or Ingress.

Scrape jobs discover Kubernetes API servers, nodes, pods, services, cAdvisor, and (if deployed) Node Exporter and kube-state-metrics. For a production-grade stack with persistence, authentication, and alerting, see the Production-Ready Prometheus on Kubernetes guide.

comments powered by Disqus
Citizix Ltd
Built with Hugo
Theme Stack designed by Jimmy