How to Export Amazon EKS Logs to CloudWatch Using Fluent Bit

Observability is a critical component of running production workloads on Kubernetes. When running Amazon Elastic Kubernetes Service (EKS), you need a reliable way to collect, aggregate, and analyze logs from all your containers. In this comprehensive guide, I’ll walk you through setting up Fluent Bit to export your EKS container logs to Amazon CloudWatch Logs.

By the end of this tutorial, you’ll have a production-ready logging pipeline that automatically collects logs from all containers in your EKS cluster and routes them to organized CloudWatch Log Groups based on Kubernetes namespaces.

What is Fluent Bit?
Why Use Fluent Bit with Amazon EKS?
Architecture Overview
Prerequisites
Setting Up IAM Roles for Service Accounts (IRSA)
Creating the Helm Values Configuration
Deploying Fluent Bit with Helmfile
Verifying the Deployment
Understanding the Log Routing
Troubleshooting Common Issues
Best Practices and Recommendations
Conclusion

What is Fluent Bit?

Fluent Bit is a lightweight, high-performance log processor and forwarder. It’s part of the Fluentd ecosystem but is designed specifically for containerized environments where resource efficiency is critical.

Key Features of Fluent Bit

Lightweight: Written in C, with a minimal memory footprint (~450KB)
High Performance: Can handle millions of records per second
Pluggable Architecture: Supports multiple inputs, filters, and outputs
Kubernetes Native: Built-in support for Kubernetes metadata enrichment
Cloud Native: Native integration with AWS, Azure, GCP, and other cloud providers

Fluent Bit vs Fluentd

Feature	Fluent Bit	Fluentd
Memory Footprint	~450KB	~40MB
Language	C	Ruby
Plugin Ecosystem	Growing	Extensive
Use Case	Edge/Container logging	Central aggregation
Performance	Higher throughput	Good throughput

For container logging in Kubernetes, Fluent Bit is the preferred choice due to its efficiency and native Kubernetes support.

Why Use Fluent Bit with Amazon EKS?

Amazon EKS doesn’t provide built-in container log collection. By default, container logs are stored on individual nodes in /var/log/containers/ and are lost when nodes are terminated or replaced. This is problematic for several reasons:

Ephemeral Nodes: EKS nodes can be replaced at any time (especially with Karpenter or Cluster Autoscaler)
Distributed Logs: Logs are scattered across multiple nodes
No Centralized Search: You can’t search across all container logs
No Retention: Logs are lost when pods restart or nodes terminate

Benefits of CloudWatch Logs Integration

Centralized Logging: All logs in one place
Retention Policies: Configure how long to keep logs
CloudWatch Logs Insights: Powerful query language for log analysis
Alarms and Metrics: Create alarms based on log patterns
Integration: Works with AWS services like Lambda, SNS, and EventBridge

Architecture Overview

Here’s how the logging architecture works:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
┌─────────────────────────────────────────────────────────────────┐
│                         EKS Cluster                              │
│                                                                  │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐          │
│  │   Node 1     │  │   Node 2     │  │   Node 3     │          │
│  │              │  │              │  │              │          │
│  │ ┌──────────┐ │  │ ┌──────────┐ │  │ ┌──────────┐ │          │
│  │ │Container │ │  │ │Container │ │  │ │Container │ │          │
│  │ │  Logs    │ │  │ │  Logs    │ │  │ │  Logs    │ │          │
│  │ └────┬─────┘ │  │ └────┬─────┘ │  │ └────┬─────┘ │          │
│  │      │       │  │      │       │  │      │       │          │
│  │ ┌────▼─────┐ │  │ ┌────▼─────┐ │  │ ┌────▼─────┐ │          │
│  │ │Fluent Bit│ │  │ │Fluent Bit│ │  │ │Fluent Bit│ │          │
│  │ │DaemonSet │ │  │ │DaemonSet │ │  │ │DaemonSet │ │          │
│  │ └────┬─────┘ │  │ └────┬─────┘ │  │ └────┬─────┘ │          │
│  └──────┼───────┘  └──────┼───────┘  └──────┼───────┘          │
│         │                 │                 │                   │
│         └────────────────┼─────────────────┘                   │
│                          │                                      │
│                          │ IRSA Authentication                  │
└──────────────────────────┼──────────────────────────────────────┘
                           │
                           ▼
┌──────────────────────────────────────────────────────────────────┐
│                    Amazon CloudWatch Logs                         │
│                                                                   │
│  ┌─────────────────────────────────────────────────────────────┐ │
│  │ Log Group: /aws/eks/{cluster-name}/prod                     │ │
│  │   └── Log Stream: pod-{name}.{namespace}.{container}        │ │
│  ├─────────────────────────────────────────────────────────────┤ │
│  │ Log Group: /aws/eks/{cluster-name}/stage                    │ │
│  │   └── Log Stream: pod-{name}.{namespace}.{container}        │ │
│  ├─────────────────────────────────────────────────────────────┤ │
│  │ Log Group: /aws/eks/{cluster-name}/application              │ │
│  │   └── Log Stream: pod-{name}.{namespace}.{container}        │ │
│  └─────────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────┘

How It Works

Container Logs: Kubernetes writes all container stdout/stderr to /var/log/containers/
Fluent Bit DaemonSet: A Fluent Bit pod runs on every node, reading log files
Metadata Enrichment: Fluent Bit adds Kubernetes metadata (pod name, namespace, labels)
Log Routing: Logs are routed to different CloudWatch Log Groups based on namespace
IRSA Authentication: Fluent Bit authenticates to CloudWatch using IAM Roles for Service Accounts

Prerequisites

Before starting, ensure you have the following:

Tools Required

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
# AWS CLI v2
aws --version
# aws-cli/2.x.x

# kubectl
kubectl version --client
# Client Version: v1.28.x

# Helm v3
helm version
# version.BuildInfo{Version:"v3.x.x"}

# Helmfile (optional but recommended)
helmfile --version
# helmfile version v0.x.x

# eksctl (for IRSA setup)
eksctl version
# 0.x.x

AWS Requirements

An existing EKS cluster with OIDC provider enabled
AWS account permissions to create IAM roles and policies
Access to create CloudWatch Log Groups

Verify EKS OIDC Provider

IRSA (IAM Roles for Service Accounts) requires an OIDC provider. Verify it’s configured:

1
2
3
4
5
6
7
8
# Get your cluster's OIDC provider
aws eks describe-cluster \
  --name YOUR_CLUSTER_NAME \
  --query "cluster.identity.oidc.issuer" \
  --output text

# Should return something like:
# https://oidc.eks.us-west-2.amazonaws.com/id/EXAMPLED539D4633E53DE1B71EXAMPLE

If no OIDC provider exists, create one:

1
2
3
eksctl utils associate-iam-oidc-provider \
  --cluster YOUR_CLUSTER_NAME \
  --approve

Setting Up IAM Roles for Service Accounts (IRSA)

IRSA allows Kubernetes pods to assume IAM roles without needing to store AWS credentials. This is the secure, recommended way to grant AWS permissions to pods.

Step 1: Create the IAM Policy

First, create an IAM policy that grants CloudWatch Logs permissions:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "FluentBitCloudWatchLogs",
      "Effect": "Allow",
      "Action": [
        "logs:CreateLogGroup",
        "logs:CreateLogStream",
        "logs:PutLogEvents",
        "logs:DescribeLogGroups",
        "logs:DescribeLogStreams"
      ],
      "Resource": "*"
    }
  ]
}

Save this as iam-policy.json and create the policy:

1
2
3
4
aws iam create-policy \
  --policy-name FluentBitCloudWatchLogsPolicy \
  --policy-document file://iam-policy.json \
  --description "Allows Fluent Bit to write logs to CloudWatch"

Note the policy ARN from the output (e.g., arn:aws:iam::123456789012:policy/FluentBitCloudWatchLogsPolicy).

Step 2: Create the IAM Role with OIDC Trust

Using Terraform (recommended for production):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
# terraform-iam.tf

# Data source to get the EKS cluster OIDC provider
data "aws_eks_cluster" "cluster" {
  name = var.cluster_name
}

# Extract OIDC provider URL without https://
locals {
  oidc_provider = replace(data.aws_eks_cluster.cluster.identity[0].oidc[0].issuer, "https://", "")
}

# Data source for the OIDC provider ARN
data "aws_iam_openid_connect_provider" "cluster" {
  url = data.aws_eks_cluster.cluster.identity[0].oidc[0].issuer
}

# IAM Role for Fluent Bit
resource "aws_iam_role" "fluent_bit_cloudwatch" {
  name = "${var.cluster_name}-fluent-bit-cloudwatch-role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect = "Allow"
        Principal = {
          Federated = data.aws_iam_openid_connect_provider.cluster.arn
        }
        Action = "sts:AssumeRoleWithWebIdentity"
        Condition = {
          StringEquals = {
            "${local.oidc_provider}:aud" = "sts.amazonaws.com"
            "${local.oidc_provider}:sub" = "system:serviceaccount:amazon-cloudwatch:aws-for-fluent-bit"
          }
        }
      }
    ]
  })

  tags = {
    Purpose = "Fluent Bit CloudWatch logging"
    Cluster = var.cluster_name
  }
}

# Attach the CloudWatch Logs policy
resource "aws_iam_role_policy_attachment" "fluent_bit_cloudwatch" {
  role       = aws_iam_role.fluent_bit_cloudwatch.name
  policy_arn = "arn:aws:iam::${data.aws_caller_identity.current.account_id}:policy/FluentBitCloudWatchLogsPolicy"
}

# Output the role ARN for use in Helm values
output "fluent_bit_role_arn" {
  description = "IAM role ARN for Fluent Bit service account"
  value       = aws_iam_role.fluent_bit_cloudwatch.arn
}

Alternatively, using eksctl:

1
2
3
4
5
6
7
eksctl create iamserviceaccount \
  --cluster=YOUR_CLUSTER_NAME \
  --namespace=amazon-cloudwatch \
  --name=aws-for-fluent-bit \
  --attach-policy-arn=arn:aws:iam::YOUR_ACCOUNT_ID:policy/FluentBitCloudWatchLogsPolicy \
  --approve \
  --override-existing-serviceaccounts

Understanding the Trust Relationship

The trust policy is critical for security. Let’s break it down:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Federated": "arn:aws:iam::123456789012:oidc-provider/oidc.eks.us-west-2.amazonaws.com/id/EXAMPLE"
      },
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
          "oidc.eks.us-west-2.amazonaws.com/id/EXAMPLE:aud": "sts.amazonaws.com",
          "oidc.eks.us-west-2.amazonaws.com/id/EXAMPLE:sub": "system:serviceaccount:amazon-cloudwatch:aws-for-fluent-bit"
        }
      }
    }
  ]
}

Federated Principal: Only the specific EKS cluster’s OIDC provider can assume this role
aud Condition: Ensures the token is intended for AWS STS
sub Condition: Restricts to only the specific service account (aws-for-fluent-bit in the amazon-cloudwatch namespace)

This means even if another pod in the same cluster tries to use this role, it will be denied unless it’s running as the exact service account specified.

Creating the Helm Values Configuration

AWS provides an official Helm chart for Fluent Bit: aws-for-fluent-bit. We’ll configure it with comprehensive settings for production use.

Step 1: Create the Helmfile

Create helmfile.yaml:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# helmfile.yaml
repositories:
  - name: aws-eks-charts
    url: https://aws.github.io/eks-charts

releases:
  - name: aws-for-fluent-bit
    namespace: amazon-cloudwatch
    createNamespace: true
    chart: aws-eks-charts/aws-for-fluent-bit
    version: "0.2.0"
    timeout: 300
    values:
      - values-{{ requiredEnv "CLUSTER_ENV" }}-{{ requiredEnv "CLUSTER_REGION" }}.yml

Step 2: Create the Values File

Create a values file for your environment. Here’s a comprehensive example for a staging cluster in us-west-2:

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
# values-stage-us-west-2.yml

# Global settings
global:
  namespaceOverride: "amazon-cloudwatch"

# Container image configuration
image:
  repository: public.ecr.aws/aws-observability/aws-for-fluent-bit
  tag: "2.32.2"
  pullPolicy: IfNotPresent

# Service Account configuration with IRSA
serviceAccount:
  create: true
  name: aws-for-fluent-bit
  annotations:
    # Replace with your IAM role ARN
    eks.amazonaws.com/role-arn: "arn:aws:iam::123456789012:role/staging-us-west-2-fluent-bit-cloudwatch-role"

# Environment variables
env:
  - name: AWS_REGION
    value: "us-west-2"
  - name: CLUSTER_NAME
    value: "staging-us-west-2"

# Resource limits and requests
resources:
  limits:
    memory: 250Mi
  requests:
    cpu: 100m
    memory: 100Mi

# Tolerations - ensure Fluent Bit runs on ALL nodes
tolerations:
  - operator: Exists
    effect: NoSchedule

# Fluent Bit Configuration
config:
  # Service configuration
  service: |
    [SERVICE]
        Daemon Off
        Flush 5
        Log_Level info
        Parsers_File /fluent-bit/etc/parsers.conf
        HTTP_Server On
        HTTP_Listen 0.0.0.0
        HTTP_Port 2020
        Health_Check On

  # Input configuration - read container logs
  inputs: |
    [INPUT]
        Name tail
        Tag kube.*
        Path /var/log/containers/*.log
        Parser docker
        DB /var/fluent-bit/state/flb_container.db
        Mem_Buf_Limit 5MB
        Skip_Long_Lines On
        Refresh_Interval 10

  # Filters for Kubernetes metadata enrichment
  filters: |
    [FILTER]
        Name kubernetes
        Match kube.*
        Kube_URL https://kubernetes.default.svc:443
        Kube_CA_File /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        Kube_Token_File /var/run/secrets/kubernetes.io/serviceaccount/token
        Kube_Tag_Prefix kube.var.log.containers.
        Merge_Log On
        Merge_Log_Key log_processed
        K8S-Logging.Parser On
        K8S-Logging.Exclude Off
        Labels On
        Annotations Off

    [FILTER]
        Name nest
        Match kube.*
        Operation lift
        Nested_under kubernetes
        Add_prefix k8s_

    [FILTER]
        Name modify
        Match kube.*
        Add cluster_name ${CLUSTER_NAME}

  # CloudWatch outputs - route by namespace
  outputs: |
    # Production namespace logs
    [OUTPUT]
        Name cloudwatch_logs
        Match kube.*
        region ${AWS_REGION}
        log_group_name /aws/eks/${CLUSTER_NAME}/prod
        log_stream_prefix pod-
        log_stream_template $kubernetes['pod_name'].$kubernetes['namespace_name'].$kubernetes['container_name']
        auto_create_group true
        log_key log
        log_format json/emf
        Retry_Limit 2
        Match_Regex kube\.var\.log\.containers\..+_prod_.+

    # Staging namespace logs
    [OUTPUT]
        Name cloudwatch_logs
        Match kube.*
        region ${AWS_REGION}
        log_group_name /aws/eks/${CLUSTER_NAME}/stage
        log_stream_prefix pod-
        log_stream_template $kubernetes['pod_name'].$kubernetes['namespace_name'].$kubernetes['container_name']
        auto_create_group true
        log_key log
        log_format json/emf
        Retry_Limit 2
        Match_Regex kube\.var\.log\.containers\..+_stage_.+

    # All other namespaces (default)
    [OUTPUT]
        Name cloudwatch_logs
        Match kube.*
        region ${AWS_REGION}
        log_group_name /aws/eks/${CLUSTER_NAME}/application
        log_stream_prefix pod-
        log_stream_template $kubernetes['pod_name'].$kubernetes['namespace_name'].$kubernetes['container_name']
        auto_create_group true
        log_key log
        log_format json/emf
        Retry_Limit 2

# Disable other outputs (we only want CloudWatch)
cloudWatchLogs:
  enabled: false # We define custom outputs above

firehose:
  enabled: false

kinesis:
  enabled: false

elasticsearch:
  enabled: false

# DaemonSet specific settings
hostNetwork: false

# Volume mounts for log access
volumes:
  - name: varlog
    hostPath:
      path: /var/log
  - name: varlibdockercontainers
    hostPath:
      path: /var/lib/docker/containers
  - name: fluent-bit-state
    hostPath:
      path: /var/fluent-bit/state

volumeMounts:
  - name: varlog
    mountPath: /var/log
    readOnly: true
  - name: varlibdockercontainers
    mountPath: /var/lib/docker/containers
    readOnly: true
  - name: fluent-bit-state
    mountPath: /var/fluent-bit/state

Understanding the Configuration

Let’s break down the key sections:

Service Configuration

1
2
3
4
5
6
7
[SERVICE]
    Daemon Off           # Run in foreground (required for containers)
    Flush 5              # Flush logs every 5 seconds
    Log_Level info       # Logging verbosity
    HTTP_Server On       # Enable metrics endpoint
    HTTP_Port 2020       # Metrics port for Prometheus scraping
    Health_Check On      # Enable health checks

Input Configuration

1
2
3
4
5
6
7
8
9
[INPUT]
    Name tail                           # Use the tail input plugin
    Tag kube.*                          # Tag all logs with kube prefix
    Path /var/log/containers/*.log      # Read all container log files
    Parser docker                       # Parse Docker JSON format
    DB /var/fluent-bit/state/flb_container.db  # State DB for tracking position
    Mem_Buf_Limit 5MB                   # Memory buffer limit per file
    Skip_Long_Lines On                  # Skip lines > 32KB
    Refresh_Interval 10                 # Check for new files every 10s

Kubernetes Filter

1
2
3
4
5
6
7
[FILTER]
    Name kubernetes                     # Kubernetes metadata filter
    Match kube.*                        # Apply to all kube-tagged logs
    Kube_URL https://kubernetes.default.svc:443  # K8s API endpoint
    Merge_Log On                        # Parse and merge JSON logs
    Labels On                           # Include pod labels
    Annotations Off                     # Exclude annotations (can be verbose)

CloudWatch Output

1
2
3
4
5
6
7
8
[OUTPUT]
    Name cloudwatch_logs                # CloudWatch Logs output plugin
    Match kube.*                        # Match all kube logs
    region ${AWS_REGION}                # AWS region (from env var)
    log_group_name /aws/eks/${CLUSTER_NAME}/application
    log_stream_template $kubernetes['pod_name']...  # Dynamic stream naming
    auto_create_group true              # Create log group if missing
    Retry_Limit 2                       # Retry failed writes twice

Deploying Fluent Bit with Helmfile

Create a Deployment Script

Create deploy.sh for easy deployment across environments:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
#!/bin/bash
set -e

# Colors for output
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m' # No Color

# Supported environments and regions
ENVIRONMENTS=("stage" "prod")
REGIONS=("us-west-2" "eu-west-1")

usage() {
    echo "Usage: $0 <environment> <region>"
    echo "       $0 all"
    echo "       $0 verify <environment> <region>"
    echo ""
    echo "Examples:"
    echo "  $0 stage us-west-2     # Deploy to staging US"
    echo "  $0 prod eu-west-1      # Deploy to prod EU"
    echo "  $0 all                 # Deploy to all clusters"
    echo "  $0 verify stage us-west-2  # Verify deployment"
    exit 1
}

deploy_cluster() {
    local env=$1
    local region=$2

    echo -e "${YELLOW}Deploying Fluent Bit to ${env}-${region}...${NC}"

    # Set kubectl context (adjust based on your context naming)
    kubectl config use-context "arn:aws:eks:${region}:${AWS_ACCOUNT_ID}:cluster/${env}-${region}"

    # Export environment variables for helmfile
    export CLUSTER_ENV="${env}"
    export CLUSTER_REGION="${region}"

    # Run helmfile
    helmfile apply

    echo -e "${GREEN}Successfully deployed to ${env}-${region}${NC}"
}

verify_deployment() {
    local env=$1
    local region=$2

    echo -e "${YELLOW}Verifying deployment in ${env}-${region}...${NC}"

    # Switch context
    kubectl config use-context "arn:aws:eks:${region}:${AWS_ACCOUNT_ID}:cluster/${env}-${region}"

    # Check DaemonSet status
    echo "DaemonSet Status:"
    kubectl get daemonset -n amazon-cloudwatch aws-for-fluent-bit

    # Check pods
    echo ""
    echo "Pod Status:"
    kubectl get pods -n amazon-cloudwatch -l app.kubernetes.io/name=aws-for-fluent-bit

    # Check logs from one pod
    echo ""
    echo "Recent logs from Fluent Bit:"
    kubectl logs -n amazon-cloudwatch -l app.kubernetes.io/name=aws-for-fluent-bit --tail=20
}

# Main logic
case "${1}" in
    "all")
        for env in "${ENVIRONMENTS[@]}"; do
            for region in "${REGIONS[@]}"; do
                deploy_cluster "$env" "$region"
            done
        done
        ;;
    "verify")
        if [[ -z "$2" || -z "$3" ]]; then
            usage
        fi
        verify_deployment "$2" "$3"
        ;;
    *)
        if [[ -z "$1" || -z "$2" ]]; then
            usage
        fi
        deploy_cluster "$1" "$2"
        ;;
esac

Make it executable:

1
chmod +x deploy.sh

Deploy to Your Cluster

1
2
3
4
5
6
7
8
# Set your AWS account ID
export AWS_ACCOUNT_ID="123456789012"

# Deploy to a specific cluster
./deploy.sh stage us-west-2

# Or deploy to all clusters
./deploy.sh all

Manual Deployment with Helm

If you prefer not to use Helmfile:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# Add the Helm repository
helm repo add aws-eks-charts https://aws.github.io/eks-charts
helm repo update

# Create the namespace
kubectl create namespace amazon-cloudwatch

# Install the chart
helm install aws-for-fluent-bit aws-eks-charts/aws-for-fluent-bit \
  --namespace amazon-cloudwatch \
  --values values-stage-us-west-2.yml

Verifying the Deployment

After deployment, verify everything is working correctly.

Check DaemonSet Status

1
2
3
4
5
kubectl get daemonset -n amazon-cloudwatch

# Expected output:
# NAME                 DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
# aws-for-fluent-bit   3         3         3       3            3           <none>          5m

The DESIRED count should match your number of nodes, and all should be READY.

Check Pod Status

1
2
3
4
5
6
7
kubectl get pods -n amazon-cloudwatch -o wide

# Expected output:
# NAME                       READY   STATUS    RESTARTS   AGE   IP            NODE
# aws-for-fluent-bit-abc12   1/1     Running   0          5m    10.0.1.50     ip-10-0-1-100.compute.internal
# aws-for-fluent-bit-def34   1/1     Running   0          5m    10.0.2.50     ip-10-0-2-100.compute.internal
# aws-for-fluent-bit-ghi56   1/1     Running   0          5m    10.0.3.50     ip-10-0-3-100.compute.internal

Check Fluent Bit Logs

1
2
3
4
kubectl logs -n amazon-cloudwatch -l app.kubernetes.io/name=aws-for-fluent-bit --tail=50

# Look for successful CloudWatch connection:
# [2025/02/18 10:00:00] [ info] [output:cloudwatch_logs:cloudwatch_logs.0] Created log group /aws/eks/staging-us-west-2/application

Verify Service Account IRSA

1
2
3
4
5
6
7
# Check service account annotations
kubectl get serviceaccount -n amazon-cloudwatch aws-for-fluent-bit -o yaml

# Should show:
# metadata:
#   annotations:
#     eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/staging-us-west-2-fluent-bit-cloudwatch-role

Check CloudWatch Log Groups

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
aws logs describe-log-groups --log-group-name-prefix /aws/eks/staging-us-west-2

# Expected output:
# {
#     "logGroups": [
#         {
#             "logGroupName": "/aws/eks/staging-us-west-2/application",
#             "creationTime": 1708251600000,
#             "storedBytes": 12345
#         }
#     ]
# }

Test Log Delivery

Create a test pod that generates logs:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# Create a test pod
kubectl run log-test --image=busybox --restart=Never -- \
  sh -c "while true; do echo 'Test log message at $(date)'; sleep 5; done"

# Wait a minute, then check CloudWatch
aws logs filter-log-events \
  --log-group-name /aws/eks/staging-us-west-2/application \
  --filter-pattern "Test log message"

# Clean up
kubectl delete pod log-test

Understanding the Log Routing

Our configuration routes logs to different CloudWatch Log Groups based on the Kubernetes namespace:

Namespace	Log Group
`prod`	`/aws/eks/{cluster}/prod`
`stage`	`/aws/eks/{cluster}/stage`
All others	`/aws/eks/{cluster}/application`

Log Stream Naming

Each pod gets its own log stream with the naming pattern:

1
pod-{pod-name}.{namespace}.{container-name}

For example:

pod-myapp-7d9f8b6c5d-abc12.prod.myapp
pod-redis-0.stage.redis
pod-nginx-ingress-controller-xyz.kube-system.controller

Querying Logs with CloudWatch Logs Insights

Once logs are in CloudWatch, you can use Logs Insights for powerful queries:

1
2
3
4
5
6
# Find all errors in the prod namespace
fields @timestamp, @message, k8s_pod_name
| filter @logGroup = '/aws/eks/staging-us-west-2/prod'
| filter @message like /error|Error|ERROR/
| sort @timestamp desc
| limit 100

1
2
3
4
5
6
# Count logs by pod in the last hour
fields @timestamp, k8s_pod_name
| filter @logGroup = '/aws/eks/staging-us-west-2/application'
| stats count(*) by k8s_pod_name
| sort count desc
| limit 20

1
2
3
4
5
6
# Find slow API requests (assuming JSON logs with duration field)
fields @timestamp, @message, duration
| filter @logGroup = '/aws/eks/staging-us-west-2/prod'
| filter duration > 1000
| sort duration desc
| limit 50

Troubleshooting Common Issues

Issue 1: Pods in CrashLoopBackOff

Symptoms:

1
2
3
kubectl get pods -n amazon-cloudwatch
# NAME                       READY   STATUS             RESTARTS   AGE
# aws-for-fluent-bit-abc12   0/1     CrashLoopBackOff   5          10m

Solution: Check logs for configuration errors:

1
2
3
4
5
6
kubectl logs -n amazon-cloudwatch aws-for-fluent-bit-abc12 --previous

# Common causes:
# - Invalid Fluent Bit configuration syntax
# - Missing environment variables
# - Invalid IAM role ARN

Issue 2: No Logs in CloudWatch

Symptoms: Pods are running but no logs appear in CloudWatch.

Debugging Steps:

Check Fluent Bit metrics:

1
2
kubectl port-forward -n amazon-cloudwatch svc/aws-for-fluent-bit 2020:2020 &
curl http://localhost:2020/api/v1/metrics/prometheus

Check for AWS errors:

1
kubectl logs -n amazon-cloudwatch -l app.kubernetes.io/name=aws-for-fluent-bit | grep -i error

Verify IRSA is working:

1
2
3
4
5
# Exec into a Fluent Bit pod
kubectl exec -it -n amazon-cloudwatch aws-for-fluent-bit-abc12 -- sh

# Try to get credentials (inside the pod)
aws sts get-caller-identity

Issue 3: IRSA Not Working

Symptoms:

1
AccessDeniedException: User: arn:aws:sts::123456789012:assumed-role/... is not authorized

Solution:

Verify the service account annotation:

1
kubectl get sa -n amazon-cloudwatch aws-for-fluent-bit -o jsonpath='{.metadata.annotations}'

Verify the IAM role trust policy matches the OIDC provider and service account
Ensure the namespace and service account name match exactly

Issue 4: High Memory Usage

Symptoms: Fluent Bit pods being OOMKilled.

Solution:

Increase memory limits:

1
2
3
resources:
  limits:
    memory: 500Mi # Increase from 250Mi

Reduce buffer sizes:

1
2
[INPUT]
    Mem_Buf_Limit 2MB  # Reduce from 5MB

Enable more aggressive flushing:

1
2
[SERVICE]
    Flush 1  # Flush every second instead of 5

Issue 5: Missing Kubernetes Metadata

Symptoms: Logs arrive in CloudWatch but without pod name, namespace, etc.

Solution:

Ensure the Kubernetes filter is correctly configured:

1
2
3
4
5
6
[FILTER]
    Name kubernetes
    Match kube.*
    Kube_URL https://kubernetes.default.svc:443
    Kube_CA_File /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    Kube_Token_File /var/run/secrets/kubernetes.io/serviceaccount/token

Also verify the service account has permissions to query the Kubernetes API.

Best Practices and Recommendations

1. Set Log Retention Policies

CloudWatch logs are retained indefinitely by default. Set retention policies to control costs:

1
2
3
4
# Set 30-day retention
aws logs put-retention-policy \
  --log-group-name /aws/eks/staging-us-west-2/application \
  --retention-in-days 30

Or use Terraform:

1
2
3
4
5
6
7
8
9
resource "aws_cloudwatch_log_group" "eks_app_logs" {
  name              = "/aws/eks/${var.cluster_name}/application"
  retention_in_days = 30

  tags = {
    Environment = var.environment
    Cluster     = var.cluster_name
  }
}

2. Use Structured JSON Logging

Configure your applications to output structured JSON logs:

1
2
3
4
5
6
7
{
  "timestamp": "2025-02-18T10:00:00Z",
  "level": "INFO",
  "message": "User logged in",
  "user_id": "12345",
  "duration_ms": 150
}

This enables powerful queries in CloudWatch Logs Insights:

1
2
3
fields @timestamp, message, user_id, duration_ms
| filter level = 'ERROR'
| sort @timestamp desc

3. Implement Log-Based Alarms

Create CloudWatch Alarms based on log patterns:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
# Create a metric filter for errors
aws logs put-metric-filter \
  --log-group-name /aws/eks/staging-us-west-2/prod \
  --filter-name ErrorCount \
  --filter-pattern "ERROR" \
  --metric-transformations \
    metricName=ErrorCount,metricNamespace=EKSLogs,metricValue=1

# Create an alarm
aws cloudwatch put-metric-alarm \
  --alarm-name "EKS-Prod-High-Errors" \
  --metric-name ErrorCount \
  --namespace EKSLogs \
  --statistic Sum \
  --period 300 \
  --threshold 10 \
  --comparison-operator GreaterThanThreshold \
  --evaluation-periods 1 \
  --alarm-actions arn:aws:sns:us-west-2:123456789012:alerts

4. Exclude Noisy Logs

Filter out verbose or unnecessary logs to reduce costs:

1
2
3
4
5
[FILTER]
    Name grep
    Match kube.*
    Exclude log healthcheck
    Exclude log kube-probe

5. Use Separate IAM Roles Per Cluster

For security and blast radius reduction, use separate IAM roles for each cluster:

staging-us-west-2-fluent-bit-role
prod-us-west-2-fluent-bit-role

This allows you to:

Revoke access to one cluster without affecting others
Apply different permissions per environment
Track CloudWatch API calls per cluster

6. Monitor Fluent Bit Itself

Create a dashboard to monitor Fluent Bit health:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
# Expose metrics for Prometheus
# The Helm chart already enables this on port 2020

# Create a ServiceMonitor for Prometheus Operator
cat <<EOF | kubectl apply -f -
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: fluent-bit
  namespace: amazon-cloudwatch
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: aws-for-fluent-bit
  endpoints:
    - port: http
      interval: 30s
      path: /api/v1/metrics/prometheus
EOF

Key metrics to monitor:

fluentbit_input_records_total - Records read
fluentbit_output_retries_total - CloudWatch retries
fluentbit_output_errors_total - Output errors

7. Plan for Multi-Cluster

If you have multiple clusters, standardize your log group naming:

1
/aws/eks/{environment}-{region}/{namespace}

This enables cross-cluster queries in CloudWatch Logs Insights by selecting multiple log groups.

Conclusion

You now have a production-ready logging pipeline that:

Collects logs from all containers in your EKS cluster
Enriches logs with Kubernetes metadata (pod name, namespace, labels)
Routes logs to organized CloudWatch Log Groups
Uses secure IRSA authentication (no static credentials)
Scales automatically with your cluster (DaemonSet)

This setup provides the foundation for observability in your EKS clusters. You can extend it by:

Adding CloudWatch Alarms for error detection
Creating dashboards in CloudWatch
Setting up log-based anomaly detection
Exporting logs to S3 for long-term archival

Fluent Bit’s efficiency makes it ideal for high-throughput logging scenarios, and its tight integration with AWS services through the official aws-for-fluent-bit chart makes it the recommended choice for EKS logging to CloudWatch.

Next Steps

Set up CloudWatch Dashboards: Create visualizations for your log data
Configure Alarms: Alert on error patterns and anomalies
Implement Log Insights Queries: Save common queries for your team
Consider Log Archival: Export to S3 for long-term, cost-effective storage
Explore Container Insights: Enable full observability with metrics and traces

Have questions or run into issues? Feel free to reach out in the comments below!

Table of Contents

What is Fluent Bit?

Key Features of Fluent Bit

Fluent Bit vs Fluentd

Why Use Fluent Bit with Amazon EKS?

Benefits of CloudWatch Logs Integration

Architecture Overview

How It Works

Prerequisites

Tools Required

AWS Requirements

Verify EKS OIDC Provider

Setting Up IAM Roles for Service Accounts (IRSA)

Step 1: Create the IAM Policy

Step 2: Create the IAM Role with OIDC Trust

Understanding the Trust Relationship

Creating the Helm Values Configuration

Step 1: Create the Helmfile

Step 2: Create the Values File

Understanding the Configuration

Service Configuration

Input Configuration

Kubernetes Filter

CloudWatch Output

Deploying Fluent Bit with Helmfile

Create a Deployment Script

Deploy to Your Cluster

Manual Deployment with Helm

Verifying the Deployment

Check DaemonSet Status

Check Pod Status

Check Fluent Bit Logs

Verify Service Account IRSA

Check CloudWatch Log Groups

Test Log Delivery

Understanding the Log Routing

Log Stream Naming

Querying Logs with CloudWatch Logs Insights

Troubleshooting Common Issues

Issue 1: Pods in CrashLoopBackOff

Issue 2: No Logs in CloudWatch

Issue 3: IRSA Not Working

Issue 4: High Memory Usage

Issue 5: Missing Kubernetes Metadata

Best Practices and Recommendations

1. Set Log Retention Policies

2. Use Structured JSON Logging

3. Implement Log-Based Alarms

4. Exclude Noisy Logs

5. Use Separate IAM Roles Per Cluster

6. Monitor Fluent Bit Itself

7. Plan for Multi-Cluster

Conclusion

Next Steps