How to Set Up Robusta for Kubernetes Monitoring on Amazon EKS (Step-by-Step)

Deploy Robusta Kubernetes observability on EKS: Helm, IRSA, External Secrets, Slack alerts. Multi-region setup, troubleshooting, and best practices for Kubernetes monitoring.

Kubernetes monitoring on Amazon EKS gets complex when you run multiple clusters or need fast root-cause analysis. Robusta is an AI-powered Kubernetes observability platform that enriches alerts with logs, events, and metrics—and can integrate with Prometheus, Alertmanager, and Slack. This guide walks you through deploying Robusta on EKS with Helm, IRSA (IAM Roles for Service Accounts), and External Secrets, so you get production-ready monitoring without hardcoded credentials.

In this guide you’ll learn:

  • What Robusta is and how it fits into EKS monitoring
  • How to create IAM roles for Robusta with Terraform (IRSA)
  • How to configure Slack and the Robusta UI, and store secrets in AWS Secrets Manager
  • How to deploy Robusta with Helm/Helmfile and verify it across clusters
  • Troubleshooting, security, and cost tips for running Robusta on EKS

What is Robusta?

Robusta is a Kubernetes troubleshooting and monitoring platform that combines alert enrichment, automated diagnostics, and AI-powered root cause analysis. Unlike traditional monitoring tools, Robusta:

  • Enriches alerts automatically with relevant logs, metrics, and Kubernetes events
  • Uses AI (Holmes GPT) to analyze issues and suggest fixes
  • Integrates with existing tools like Slack, Prometheus, and AlertManager
  • Provides a centralized UI for historical analysis across all clusters
  • Automates common troubleshooting tasks through customizable playbooks

Architecture Overview

Before diving into the setup, let’s understand how Robusta integrates with EKS:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
┌─────────────────────────────────────────────────────┐
│  EKS Cluster (us-west-2 / eu-west-1)                │
│                                                      │
│  ┌──────────────────────────────────────┐           │
│  │  Namespace: robusta                  │           │
│  │                                       │           │
│  │  ┌────────────┐     ┌──────────────┐ │           │
│  │  │ Runner     │     │ Forwarder    │ │           │
│  │  │ - Playbooks│────▶│ - Event relay│─┼─────────▶ Robusta UI
│  │  │ - Enrichers│     │ - UI sync    │ │            platform.robusta.dev
│  │  └─────┬──────┘     └──────────────┘ │
│  │        │ (IRSA)                       │
│  │        ▼                              │
│  │  Service Account                      │
│  └────────┬─────────────────────────────┘
│           │                               │
└───────────┼───────────────────────────────┘
    IAM Role (IRSA)
    - CloudWatch Logs (RO)
    - EKS Describe (RO)
    - EC2 Describe (RO)

Key Components:

  1. Runner: Executes playbooks, enriches alerts, collects cluster data
  2. Forwarder: Watches Kubernetes API and relays events to the runner
  3. IRSA (IAM Roles for Service Accounts): Provides AWS permissions without credentials
  4. External Secrets: Injects credentials from AWS Secrets Manager
  5. Robusta UI Sink: Sends data to the centralized platform

Prerequisites

Before starting, ensure you have:

Tools Installed

  • kubectl - Kubernetes CLI
  • helm - Kubernetes package manager
  • helmfile - Declarative Helm deployment tool
  • aws-cli - AWS command line interface
  • terraform - Infrastructure as code tool
  • robusta-cli - Robusta command line tool

Existing Infrastructure

Required Permissions

  • EKS cluster admin access
  • AWS IAM permissions to create roles and policies
  • AWS Secrets Manager write access
  • Slack admin access to create apps

Step 1: Create IAM Roles with Terraform

Robusta needs read-only access to AWS services for enriching alerts with CloudWatch logs, EC2 instance information, and EKS metadata. We’ll use IRSA (IAM Roles for Service Accounts) for secure, credential-less access.

Directory Structure

First, create this directory structure for your Terraform configuration:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
terraform/robusta-iam/
├── providers.tf                   # Terraform and AWS provider configuration
├── variables.tf                   # Variable definitions
├── main.tf                        # IAM policy resource
├── policy.json                    # IAM policy document
├── irsa.tf                        # IRSA role configuration
├── outputs.tf                     # Output definitions
├── prod-us-west-2.tfvars         # Production US West 2 variables
├── prod-eu-west-1.tfvars         # Production EU West 1 variables
├── stage-us-west-2.tfvars        # Staging US West 2 variables
└── stage-eu-west-1.tfvars        # Staging EU West 1 variables

1.1 Set Up Terraform Configuration

First, let’s create the Terraform provider configuration:

File: terraform/robusta-iam/providers.tf

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
terraform {
  required_version = ">= 1.0"

  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }

  # Optional: Configure remote state backend
  # backend "s3" {
  #   bucket = "your-terraform-state-bucket"
  #   key    = "robusta-iam/terraform.tfstate"
  #   region = "us-west-2"
  # }
}

provider "aws" {
  region = var.aws_region
}

1.2 Create IAM Policy

Create the IAM policy that defines what permissions Robusta will have:

File: terraform/robusta-iam/policy.json

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "logs:DescribeLogGroups",
        "logs:DescribeLogStreams",
        "logs:GetLogEvents",
        "logs:FilterLogEvents"
      ],
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "eks:DescribeCluster",
        "eks:ListClusters",
        "eks:DescribeNodegroup",
        "eks:ListNodegroups"
      ],
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "ec2:DescribeInstances",
        "ec2:DescribeVolumes",
        "ec2:DescribeNetworkInterfaces"
      ],
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "cloudwatch:GetMetricData",
        "cloudwatch:GetMetricStatistics",
        "cloudwatch:ListMetrics"
      ],
      "Resource": "*"
    }
  ]
}

Why these permissions?

  • CloudWatch Logs: Robusta retrieves pod logs when alerts fire
  • EKS: Gets cluster and node group information for context
  • EC2: Describes instances to correlate pods with underlying infrastructure
  • CloudWatch Metrics: Enriches alerts with CPU, memory, and other metrics

File: terraform/robusta-iam/variables.tf

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
variable "environment" {
  description = "Environment name (prod, stage)"
  type        = string
}

variable "aws_region" {
  description = "AWS region"
  type        = string
}

variable "account_id" {
  description = "AWS Account ID"
  type        = string
}

variable "oidc_provider_id" {
  description = "EKS cluster OIDC provider ID"
  type        = string
}

variable "cluster_name" {
  description = "EKS cluster name"
  type        = string
}

File: terraform/robusta-iam/main.tf

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# Create IAM policy for Robusta
resource "aws_iam_policy" "robusta_controller" {
  name        = "robusta-controller-${var.environment}-${var.aws_region}"
  path        = "/"
  description = "IAM policy for Robusta Kubernetes observability"

  policy = file("${path.module}/policy.json")

  tags = {
    terraform   = "yes"
    environment = var.environment
    owner       = "platform-team"
  }
}

1.3 Create IRSA Role

Now create the IAM role that the Robusta service account will assume:

File: terraform/robusta-iam/irsa.tf

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
# Get the OIDC provider
data "aws_iam_openid_connect_provider" "eks" {
  url = "https://oidc.eks.${var.aws_region}.amazonaws.com/id/${var.oidc_provider_id}"
}

# Create IRSA role for Robusta service account
module "robusta_irsa_role" {
  source  = "terraform-aws-modules/iam/aws//modules/iam-role-for-service-accounts-eks"
  version = "~> 5.30"

  role_name = "robusta-controller-${var.environment}-${var.aws_region}"

  role_policy_arns = {
    robusta_policy = aws_iam_policy.robusta_controller.arn
  }

  oidc_providers = {
    main = {
      provider_arn               = data.aws_iam_openid_connect_provider.eks.arn
      namespace_service_accounts = ["robusta:robusta-runner-service-account"]
    }
  }

  tags = {
    terraform   = "yes"
    environment = var.environment
    owner       = "platform-team"
  }
}

File: terraform/robusta-iam/outputs.tf

1
2
3
4
5
6
7
8
9
output "role_arn" {
  description = "ARN of the IAM role for Robusta"
  value       = module.robusta_irsa_role.iam_role_arn
}

output "policy_arn" {
  description = "ARN of the IAM policy for Robusta"
  value       = aws_iam_policy.robusta_controller.arn
}

Key Points:

  • OIDC Provider: Links the IAM role to your EKS cluster’s identity provider
  • Service Account: Only the robusta:robusta-runner-service-account can assume this role
  • Policy Attachment: Attaches the policy we created in the previous step

1.4 Deploy the IAM Resources

Create a variables file for your environment:

File: terraform/robusta-iam/prod-us-west-2.tfvars

1
2
3
4
5
environment      = "prod"
aws_region       = "us-west-2"
account_id       = "00000000000"
oidc_provider_id = "EXAMPLED539D4633E53DE1B71EXAMPLE"
cluster_name     = "production-cluster-us-west-2"

Finding your OIDC Provider ID:

1
2
3
4
5
6
7
8
9
# Get the OIDC provider URL from your EKS cluster
aws eks describe-cluster \
  --name production-cluster-us-west-2 \
  --region us-west-2 \
  --query "cluster.identity.oidc.issuer" \
  --output text

# Output: https://oidc.eks.us-west-2.amazonaws.com/id/EXAMPLED539D4633E53DE1B71EXAMPLE
# The OIDC ID is the part after /id/

Now deploy the Terraform configuration:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
cd terraform/robusta-iam

# Initialize Terraform (first time only)
terraform init

# Review what will be created
terraform plan -var-file="prod-us-west-2.tfvars"

# Apply the configuration
terraform apply -var-file="prod-us-west-2.tfvars"

After applying, Terraform will output the IAM role ARN:

1
2
3
4
Outputs:

role_arn = "arn:aws:iam::00000000000:role/robusta-controller-prod-us-west-2-20260204123137827600000001"
policy_arn = "arn:aws:iam::00000000000:policy/robusta-controller-prod-us-west-2"

Copy the role_arn output - you’ll need it for the Helm values files in Step 5.

Repeat these steps for all your environments and regions:

1
2
3
4
5
6
7
8
# Production EU West 1
terraform apply -var-file="prod-eu-west-1.tfvars"

# Staging US West 2
terraform apply -var-file="stage-us-west-2.tfvars"

# Staging EU West 1
terraform apply -var-file="stage-eu-west-1.tfvars"

Step 2: Configure Slack Integration

Robusta sends rich, context-aware notifications to Slack. Let’s set up the Slack app.

2.1 Create Slack App

  1. Go to Slack API Apps
  2. Click Create New AppFrom scratch
  3. Name it “Robusta Monitoring” and select your workspace
  4. Click Create App

2.2 Configure Bot Permissions

  1. Go to OAuth & Permissions in the left sidebar
  2. Scroll to Bot Token Scopes
  3. Add these scopes:
    • chat:write - Send messages to channels
    • chat:write.public - Post to channels without joining
    • files:write - Upload files (logs, graphs, etc.)

2.3 Install App and Get Token

  1. Scroll to the top and click Install to Workspace
  2. Review permissions and click Allow
  3. Copy the Bot User OAuth Token (starts with xoxb-)
  4. Save this token - you’ll need it shortly

2.4 Add Bot to Channel

  1. Create or select a Slack channel (e.g., #robusta-alerts)
  2. In the channel, type: /invite @Robusta Monitoring
  3. Or click channel name → IntegrationsAdd apps

Step 3: Set Up Robusta Platform Account

Robusta provides a centralized UI for viewing alerts, running investigations, and using AI analysis across all your clusters.

3.1 Install Robusta CLI

1
pip3 install -U robusta-cli --no-cache

3.2 Generate UI Integration Token

1
2
# Configure with your details
robusta integrations ui

When prompted:

  • Email: Your work email (used for login)
  • Organization: Your company name (e.g., “Acme Corp”)

The CLI will output a base64-encoded token. Copy this token - we’ll add it to AWS Secrets Manager.

Step 4: Store Credentials in AWS Secrets Manager

We’ll use AWS Secrets Manager to securely store all Robusta credentials, then inject them into Kubernetes using the External Secrets Operator. If you haven’t set up External Secrets yet, follow how to use External Secrets with AWS Secrets Manager first.

4.1 Create the Secret

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
aws secretsmanager create-secret \
  --name robusta \
  --description "Robusta credentials for production clusters" \
  --secret-string '{
    "accountId": "b66d0445-0215-4993-8732-0ea1fe730558",
    "signingKey": "your-signing-key-from-platform.robusta.dev",
    "robustaUiToken": "your-base64-token-from-cli",
    "slackChannel": "#robusta-alerts",
    "slackApiKey": "xoxb-your-slack-bot-token"
  }'

Where to get these values:

4.2 Verify Secret

1
2
3
4
aws secretsmanager get-secret-value \
  --secret-id robusta \
  --query SecretString \
  --output text | jq

Step 5: Create Helm Release Configuration

Now we’ll create the declarative Helm configuration for deploying Robusta across multiple clusters.

5.1 Directory Structure

Create this structure in your infrastructure repository:

1
2
3
4
5
6
7
8
9
k8s/releases/robusta/
├── helmfile.yaml                    # Main Helm deployment config
├── external-secret.yaml             # External Secrets configuration
├── values-stage-us-west-2.yml       # Staging US West 2 values
├── values-stage-eu-west-1.yml       # Staging EU West 1 values
├── values-prod-us-west-2.yml        # Production US West 2 values
├── values-prod-eu-west-1.yml        # Production EU West 1 values
├── setup-secrets.sh                 # Helper script for secret setup
└── readme.md                        # Documentation

5.2 Create Helmfile

File: k8s/releases/robusta/helmfile.yaml

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
helmDefaults:
  createNamespace: true
  timeout: 600
  wait: true

repositories:
  - name: robusta
    url: https://robusta-charts.storage.googleapis.com

releases:
  - name: robusta
    namespace: robusta
    chart: robusta/robusta
    version: "0.32.0"
    values:
      - ./values-{{ requiredEnv "CLUSTER_ENV" }}-{{ requiredEnv "CLUSTER_REGION" }}.yml
    hooks:
      - events: ["presync"]
        showlogs: true
        command: "sh"
        args:
          [
            "-c",
            "kubectl create namespace robusta --dry-run=client -o yaml | kubectl apply -f -",
          ]
      - events: ["presync"]
        showlogs: true
        command: "kubectl"
        args: ["apply", "-f", "./external-secret.yaml"]

Key Features:

  • Environment Variables: Uses CLUSTER_ENV and CLUSTER_REGION to select the correct values file
  • Presync Hooks: Creates namespace and applies External Secret before Helm installation
  • Version Pinning: Explicit version ensures reproducible deployments

5.3 Create External Secret Configuration

File: k8s/releases/robusta/external-secret.yaml

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
apiVersion: external-secrets.io/v1
kind: ExternalSecret
metadata:
  name: robusta-secrets
  namespace: robusta
spec:
  refreshInterval: 1h
  secretStoreRef:
    name: aws-secret-manager
    kind: ClusterSecretStore
  target:
    name: robusta-secrets
    creationPolicy: Owner
  data:
    - secretKey: signing-key
      remoteRef:
        key: robusta
        property: signingKey
    - secretKey: account-id
      remoteRef:
        key: robusta
        property: accountId
    - secretKey: robusta-ui-token
      remoteRef:
        key: robusta
        property: robustaUiToken
    - secretKey: slack-channel
      remoteRef:
        key: robusta
        property: slackChannel
    - secretKey: slack-api-key
      remoteRef:
        key: robusta
        property: slackApiKey

How it works:

  1. External Secrets Operator reads from AWS Secrets Manager
  2. Creates a Kubernetes Secret named robusta-secrets
  3. Refreshes every hour to pick up any credential changes
  4. Robusta pods mount this secret as environment variables

5.4 Create Values Files

Robusta v0.32.0 uses a two-step secret injection pattern:

  1. Load secrets as environment variables via runner.additional_env_vars
  2. Reference them in configuration using {{ env.VARIABLE }} syntax

File: k8s/releases/robusta/values-prod-us-west-2.yml

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
# Robusta configuration for production - us-west-2
globalConfig:
  signing_key: "{{ env.ROBUSTA_SIGNING_KEY }}"
  account_id: "{{ env.ROBUSTA_ACCOUNT_ID }}"
  prometheus_url: ""
  alertmanager_url: ""

# Cluster identification
clusterName: production-cluster-us-west-2

# Enable components
enableRunner: true
enableForwarder: true

# Service Account with IRSA for AWS access
serviceAccount:
  create: true
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::00000000000:role/robusta-controller-prod-us-west-2-20260204123137827600000001

# Robusta sinks configuration
sinksConfig:
  - robusta_sink:
      name: robusta_ui_sink
      token: "{{ env.ROBUSTA_UI_TOKEN }}"
  - slack_sink:
      name: main_slack_sink
      slack_channel: "{{ env.SLACK_CHANNEL }}"
      api_key: "{{ env.SLACK_API_KEY }}"

# Custom playbooks
customPlaybooks:
  - triggers:
      - on_prometheus_alert:
          alert_name: KubePodCrashLooping
    actions:
      - create_finding:
          title: "Pod is crash looping"
          aggregation_key: "KubePodCrashLooping"
      - pod_events_enricher: {}

# Resource limits and environment variables
runner:
  sendAdditionalTelemetry: false
  additional_env_vars:
    - name: ROBUSTA_SIGNING_KEY
      valueFrom:
        secretKeyRef:
          name: robusta-secrets
          key: signing-key
    - name: ROBUSTA_ACCOUNT_ID
      valueFrom:
        secretKeyRef:
          name: robusta-secrets
          key: account-id
    - name: ROBUSTA_UI_TOKEN
      valueFrom:
        secretKeyRef:
          name: robusta-secrets
          key: robusta-ui-token
    - name: SLACK_CHANNEL
      valueFrom:
        secretKeyRef:
          name: robusta-secrets
          key: slack-channel
    - name: SLACK_API_KEY
      valueFrom:
        secretKeyRef:
          name: robusta-secrets
          key: slack-api-key
  resources:
    requests:
      cpu: 200m
      memory: 1Gi
    limits:
      cpu: 1000m
      memory: 2Gi

forwarder:
  resources:
    requests:
      cpu: 200m
      memory: 512Mi
    limits:
      cpu: 500m
      memory: 1Gi

# Environment-specific settings
env:
  - name: CLUSTER_NAME
    value: "production-us-west-2"
  - name: AWS_REGION
    value: "us-west-2"
  - name: CLUSTER_ENV
    value: "prod"

Important Configuration Points:

  1. Service Account Annotation: Must match the IAM role ARN from Step 1
  2. Cluster Name: Unique identifier for this cluster in Robusta UI
  3. Empty Prometheus URLs: Prevents harmless startup warnings if you don’t use Prometheus
  4. Two Sinks:
    • robusta_sink: Sends data to Robusta UI
    • slack_sink: Sends real-time notifications to Slack
  5. Custom Playbook: Example that enriches pod crash loop alerts with events

Create similar files for:

  • values-prod-eu-west-1.yml
  • values-stage-us-west-2.yml
  • values-stage-eu-west-1.yml

Adjust:

  • clusterName for each environment/region
  • eks.amazonaws.com/role-arn to match your IAM roles
  • Resource limits (staging can use less than production)

Step 6: Deploy Robusta

Now that everything is configured, let’s deploy!

6.1 Deploy to Production US West 2

1
2
3
4
5
6
7
8
9
# Navigate to the robusta release directory
cd k8s/releases/robusta

# Set environment variables
export CLUSTER_ENV=prod
export CLUSTER_REGION=us-west-2

# Deploy with helmfile
helmfile sync

What happens:

  1. Helmfile creates the robusta namespace
  2. Applies the ExternalSecret to create robusta-secrets
  3. Installs Robusta Helm chart with the production US West 2 values
  4. Runner pod starts and connects to Robusta UI
  5. Forwarder pod starts watching Kubernetes API

6.2 Verify Deployment

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
# Check pods are running
kubectl get pods -n robusta

# Expected output:
# NAME                                 READY   STATUS    RESTARTS   AGE
# robusta-runner-xxxxxxxxxx-xxxxx      1/1     Running   0          2m
# robusta-forwarder-xxxxxxxx-xxxx      1/1     Running   0          2m

# Check runner logs
kubectl logs -n robusta -l app=robusta-runner --tail=100

# Look for these success indicators:
# ✓ "Supabase dal login"
# ✓ "connecting to server as account_id=..."
# ✓ "Setting cluster active to True"
# ✓ "Cluster historical data sent"

# Verify External Secret synced
kubectl get externalsecret -n robusta
kubectl get secret robusta-secrets -n robusta

6.3 Deploy to Other Clusters

Repeat for each cluster:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# Production EU West 1
export CLUSTER_ENV=prod
export CLUSTER_REGION=eu-west-1
helmfile sync

# Staging US West 2
export CLUSTER_ENV=stage
export CLUSTER_REGION=us-west-2
helmfile sync

# Staging EU West 1
export CLUSTER_ENV=stage
export CLUSTER_REGION=eu-west-1
helmfile sync

Step 7: Verify in Robusta UI

7.1 Access the UI

  1. Go to platform.robusta.dev
  2. Login with your email from Step 3
  3. Navigate to Clusters page

You should see all your clusters listed with:

  • Name: e.g., “production-cluster-us-west-2”
  • Status: Green “Connected”
  • Version: “0.32.0”
  • Last Seen: Recent timestamp

7.2 Explore the Timeline

Click on Timeline to see:

  • Pod crashes and restarts
  • Deployment updates
  • Node changes
  • Kubernetes events

Each event is enriched with:

  • Related logs
  • Resource manifests
  • Recent changes
  • AI analysis (if enabled)

7.3 Test Slack Integration

Create a test alert:

1
2
3
4
5
6
7
8
9
# Create a pod that will crash
kubectl run test-crash --image=busybox --restart=Never -- sh -c "exit 1"

# Check Slack channel
# You should see a notification with:
# - Pod name and namespace
# - Crash reason
# - Container logs
# - Recent events

Clean up:

1
kubectl delete pod test-crash

Step 8: Customize Playbooks

Playbooks are Robusta’s automation rules. They define what happens when specific events occur.

8.1 Add a Deployment Update Playbook

Edit your values file:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
customPlaybooks:
  # Existing playbook for crash loops
  - triggers:
      - on_prometheus_alert:
          alert_name: KubePodCrashLooping
    actions:
      - create_finding:
          title: "Pod is crash looping"
      - pod_events_enricher: {}

  # New playbook for deployments
  - triggers:
      - on_deployment_update:
          namespace_prefix: production
    actions:
      - create_finding:
          title: "Production deployment updated"
          severity: INFO
      - deployment_status_enricher: {}
      - deployment_events_enricher: {}

Redeploy:

1
helmfile sync

8.2 Common Enrichment Actions

Here are useful enrichment actions you can add to any playbook:

  • pod_events_enricher: {} - Show recent pod events
  • logs_enricher: {} - Attach pod logs
  • pod_graph_enricher: {} - Add CPU/memory graphs
  • prometheus_enricher: {} - Add Prometheus metrics (if configured)
  • related_pods: {} - Show related pods (same deployment, etc.)
  • deployment_events_enricher: {} - Show deployment events
  • resource_events_enricher: {} - Show resource-level events

Full reference: Robusta Actions Documentation

Troubleshooting

Pod CrashLoopBackOff

Check logs:

1
kubectl logs -n robusta -l app=robusta-runner --tail=200

Common issues:

  1. Action not found (e.g., “Action pod_events not found”)

    • Cause: Action names changed in v0.32.0
    • Fix: Use pod_events_enricher instead of pod_events
  2. Invalid configuration format

    • Cause: Using old valueFrom syntax in globalConfig
    • Fix: Use {{ env.VAR }} syntax instead
  3. Missing environment variables

    • Cause: runner.additional_env_vars not configured
    • Fix: Ensure all env vars are loaded from secrets

Slack Errors

Error: missing_scope - Need files:write

The Slack bot is missing required permissions.

Fix:

  1. Go to api.slack.com/apps → Select your app
  2. Go to OAuth & Permissions
  3. Under Bot Token Scopes, add files:write
  4. Click Reinstall App
  5. Copy the new Bot OAuth Token
  6. Update AWS Secrets Manager:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
aws secretsmanager update-secret \
  --secret-id robusta \
  --secret-string '{
    "accountId": "...",
    "signingKey": "...",
    "robustaUiToken": "...",
    "slackChannel": "#robusta-alerts",
    "slackApiKey": "xoxb-NEW-TOKEN-HERE"
  }'

# Force sync
kubectl delete externalsecret robusta-secrets -n robusta
kubectl apply -f external-secret.yaml
kubectl rollout restart deployment robusta-runner -n robusta

Error: not_in_channel

The bot hasn’t been added to the Slack channel.

Fix:

  1. Go to your Slack channel
  2. Type /invite @RobustaBot
  3. Or click channel name → IntegrationsAdd apps

No restart needed - works immediately.

Cluster Not Appearing in UI

1. Check if runner is connected:

1
kubectl logs -n robusta -l app=robusta-runner --tail=200 | grep -i "robusta_sink\|connecting"

Look for:

1
2
Adding <class 'robusta.core.sinks.robusta.robusta_sink_params.RobustaSinkConfigWrapper'> sink named robusta_ui_sink
connecting to server as account_id=...

2. Verify UI token:

1
2
# Token should be exactly 572 characters
kubectl get secret robusta-secrets -n robusta -o jsonpath='{.data.robusta-ui-token}' | base64 -d | wc -c

3. Check forwarder logs:

1
kubectl logs -n robusta -l app=robusta-forwarder --tail=100

4. Force data sync:

1
2
3
4
5
# Create a test event
kubectl run test-sync --image=nginx
kubectl delete pod test-sync

# Check UI Timeline within a few seconds

IAM Permission Issues

Error: AccessDenied when calling AWS APIs

Verify IRSA configuration:

1
2
3
4
5
6
7
8
# Check service account
kubectl get serviceaccount -n robusta robusta-runner-service-account -o yaml

# Verify role annotation
kubectl get serviceaccount -n robusta robusta-runner-service-account \
  -o jsonpath='{.metadata.annotations.eks\.amazonaws\.com/role-arn}'

# Should output: arn:aws:iam::ACCOUNT_ID:role/robusta-controller-...

Test role assumption:

1
2
3
4
5
6
7
8
9
# Exec into runner pod
kubectl exec -it -n robusta \
  $(kubectl get pod -n robusta -l app=robusta-runner -o jsonpath='{.items[0].metadata.name}') \
  -- sh

# Inside pod, check AWS identity
aws sts get-caller-identity

# Should show AssumedRole with robusta-controller role

Verify IAM role exists:

1
aws iam get-role --role-name robusta-controller-prod-us-west-2-SUFFIX

Best Practices

1. Use GitOps for Configuration

Store all Robusta configuration in Git:

  • Values files tracked in version control
  • External Secret configurations committed
  • Helm releases managed by ArgoCD or Flux

2. Separate Secrets by Environment

Create separate AWS Secrets Manager secrets for staging and production:

1
2
3
4
5
# Staging secret
aws secretsmanager create-secret --name robusta-staging ...

# Production secret
aws secretsmanager create-secret --name robusta-production ...

Update ExternalSecret per environment:

1
2
3
spec:
  secretStoreRef:
    name: aws-secret-manager-{{ requiredEnv "CLUSTER_ENV" }}

3. Use Specific IAM Permissions

Follow principle of least privilege:

  • Read-only access to AWS services
  • Scoped to specific resources where possible
  • Separate roles per environment

4. Monitor Resource Usage

Track Robusta’s resource consumption:

1
2
3
4
5
# Check current usage
kubectl top pods -n robusta

# Review limits
kubectl describe deployment robusta-runner -n robusta | grep -A 10 "Limits:"

Adjust resources in values files based on actual usage.

5. Keep Robusta Updated

Check for new releases:

1
helm search repo robusta/robusta --versions | head -10

Update helmfile.yaml:

1
version: "0.33.0" # New version

Test in staging first, then production.

6. Create Custom Playbooks for Your Stack

Tailor Robusta to your specific needs:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
customPlaybooks:
  # Alert on high-traffic pods
  - triggers:
      - on_prometheus_alert:
          alert_name: HighRequestRate
    actions:
      - logs_enricher:
          filter_regex: "error|exception"
      - prometheus_enricher:
          promql_query: "rate(http_requests_total[5m])"

  # Auto-scale on memory pressure
  - triggers:
      - on_prometheus_alert:
          alert_name: HighMemoryUsage
    actions:
      - create_finding:
          title: "Memory pressure detected"
      - pod_graph_enricher: {}

Performance Considerations

Resource Requirements

Based on cluster size:

Cluster SizeRunner CPURunner MemoryForwarder CPUForwarder Memory
Small (<50 pods)100m512Mi100m256Mi
Medium (50-200 pods)200m1Gi200m512Mi
Large (200+ pods)500m2Gi500m1Gi

Network Considerations

Robusta generates outbound traffic to:

  • Robusta UI: Event data and heartbeats
  • Slack API: Notification payloads
  • AWS APIs: CloudWatch Logs, EKS, EC2 queries

Ensure network policies and security groups allow these connections.

Storage Considerations

Robusta is stateless and doesn’t require persistent storage. However:

  • Temporary files use emptyDir volumes
  • Logs are rotated automatically
  • No PVC needed

Security Considerations

1. IRSA Instead of IAM Users

Never use IAM access keys:

  • ✅ Use IRSA (IAM Roles for Service Accounts)
  • ❌ Don’t create IAM users with access keys
  • ❌ Don’t mount AWS credentials in pods

2. Secret Rotation

Rotate credentials regularly:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# Slack token - regenerate in Slack UI
# Update AWS Secrets Manager
aws secretsmanager update-secret --secret-id robusta ...

# Robusta signing key - regenerate at platform.robusta.dev
# Update AWS Secrets Manager
aws secretsmanager update-secret --secret-id robusta ...

# Robusta UI token - regenerate with CLI
robusta integrations ui
aws secretsmanager update-secret --secret-id robusta ...

# Force pods to restart and pick up new credentials
kubectl rollout restart deployment robusta-runner -n robusta

3. Network Policies

Restrict Robusta’s network access:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: robusta-egress
  namespace: robusta
spec:
  podSelector:
    matchLabels:
      app: robusta-runner
  policyTypes:
    - Egress
  egress:
    # Allow DNS
    - to:
        - namespaceSelector:
            matchLabels:
              name: kube-system
      ports:
        - protocol: UDP
          port: 53
    # Allow Kubernetes API
    - to:
        - namespaceSelector: {}
      ports:
        - protocol: TCP
          port: 443
    # Allow Robusta platform
    - to:
        - ipBlock:
            cidr: 0.0.0.0/0
      ports:
        - protocol: TCP
          port: 443

4. Audit Logging

Enable audit logs for Robusta actions:

1
2
3
4
globalConfig:
  custom_annotations:
    - key: "robusta.dev/audit"
      value: "enabled"

Cost Optimization

1. Right-Size Resources

Start conservative, scale up based on metrics:

1
2
3
4
# Monitor actual usage over 7 days
kubectl top pods -n robusta --containers

# Adjust resources accordingly

2. Use Spot Instances

Robusta tolerates interruptions well:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
# In your values file
tolerations:
  - key: "node.kubernetes.io/instance-type"
    operator: "Equal"
    value: "spot"
    effect: "NoSchedule"

affinity:
  nodeAffinity:
    preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 100
        preference:
          matchExpressions:
            - key: "node.kubernetes.io/instance-type"
              operator: In
              values:
                - spot

3. Optimize External Secrets Refresh

Reduce API calls to AWS Secrets Manager:

1
2
spec:
  refreshInterval: 24h # Instead of 1h

Credentials don’t change frequently, so daily refresh is sufficient.

Frequently Asked Questions (FAQ)

What is Robusta in Kubernetes?

Robusta is a Kubernetes troubleshooting and observability platform that automatically enriches alerts with logs, metrics, and events. It connects to Prometheus and Alertmanager, sends notifications to Slack (and other channels), and offers AI-assisted root cause analysis via the Robusta UI.

How do I install Robusta on EKS?

Install Robusta on Amazon EKS by: (1) creating IAM policy and IRSA role with Terraform, (2) storing credentials in AWS Secrets Manager and syncing them with External Secrets, (3) deploying the Robusta Helm chart with a values file that references those secrets and your Slack/UI config. Use helmfile sync (or helm install) after setting CLUSTER_ENV and CLUSTER_REGION.

Does Robusta work with Prometheus and Alertmanager?

Yes. Robusta integrates with Prometheus and Alertmanager. You can set globalConfig.prometheus_url and globalConfig.alertmanager_url in your Helm values, and use playbooks triggered by on_prometheus_alert to enrich and route alerts to Slack or the Robusta UI.

Why use IRSA for Robusta on EKS?

IRSA (IAM Roles for Service Accounts) lets the Robusta runner use AWS APIs (CloudWatch Logs, EKS, EC2) without storing access keys. The pod assumes an IAM role via the service account annotation, which is more secure and easier to rotate than static credentials.

How often should I rotate Robusta credentials?

Rotate Slack tokens, Robusta signing keys, and UI tokens periodically (e.g. every 90 days). Update the values in AWS Secrets Manager, then restart the runner (e.g. kubectl rollout restart deployment robusta-runner -n robusta) so it picks up the new credentials after External Secrets syncs.

Conclusion

You now have a production-ready Robusta deployment across multiple EKS clusters! This setup provides:

AI-powered troubleshooting with Holmes GPT ✅ Real-time Slack notifications with rich context ✅ Centralized UI for all clusters ✅ Secure credential management with External Secrets ✅ IAM roles without hardcoded credentials ✅ Multi-region deployment with GitOps ✅ Custom playbooks for your specific needs

Next Steps

  1. Add Prometheus Integration: Connect Robusta to Prometheus for metric enrichment—see our production Prometheus on Kubernetes guide and how to install and configure Prometheus Alertmanager for alerting.
  2. Enable Holmes AI: Set up AI-powered root cause analysis in the Robusta UI.
  3. Create More Playbooks: Automate responses to common issues.
  4. Set Up AlertManager: Integrate with existing alerting infrastructure.
  5. Configure MS Teams/PagerDuty: Add additional notification channels.

Additional Resources

Questions or Issues?

If you encounter any problems:

  1. Check the Robusta Troubleshooting Guide
  2. Review logs: kubectl logs -n robusta -l app=robusta-runner
  3. Join Robusta Slack Community
  4. Open an issue on GitHub

Happy monitoring! 🚀

comments powered by Disqus
Citizix Ltd
Built with Hugo
Theme Stack designed by Jimmy