Complete Guide to Setting Up Autoscaling with Karpenter on Amazon EKS

Step-by-step guide on how to set up Karpenter autoscaling on Amazon EKS using Terraform. Learn how to achieve 40-60% cost savings with faster node provisioning in under 60 seconds.

Setting Up Autoscaling with Karpenter on Amazon EKS: A Complete Guide

Kubernetes cluster autoscaling has evolved significantly over the years. If you’re still using the traditional Cluster Autoscaler, you’re likely experiencing slow scaling times (3-5 minutes) and the operational overhead of managing multiple node groups. Enter Karpenter - a flexible, high-performance Kubernetes cluster autoscaler that can provision right-sized compute resources in under 60 seconds.

In this comprehensive guide, I’ll walk you through setting up Karpenter on Amazon EKS using Terraform for infrastructure management, from the basics to a fully operational autoscaling solution.

Table of Contents

  1. What is Karpenter?
  2. Prerequisites
  3. Architecture Overview
  4. Step 1: Setting Up IAM Roles with Terraform
  5. Step 2: Installing Karpenter Using Helm
  6. Step 3: Configuring EC2NodeClasses
  7. Step 4: Creating NodePools
  8. Step 5: Testing and Validation
  9. Monitoring and Troubleshooting
  10. Best Practices and Cost Optimization

What is Karpenter?

Karpenter is an open-source Kubernetes cluster autoscaler built by AWS that dramatically improves upon the traditional Cluster Autoscaler. Here’s why it’s a game-changer:

  • Fast Provisioning: Nodes appear in 30-60 seconds vs 3-5 minutes with Cluster Autoscaler
  • Right-Sizing: Automatically selects optimal instance types based on actual pod requirements
  • Cost Optimization: Uses Spot instances and consolidates underutilized nodes
  • Simplified Configuration: Single NodePool instead of multiple node groups
  • Instance Diversity: Selects from a wide range of instance types for better Spot availability

According to recent performance benchmarks, organizations using Karpenter have seen 40-60% cost reduction compared to managed node groups with Cluster Autoscaler.

Prerequisites

Before we begin, ensure you have:

  • AWS Account with appropriate permissions
  • Existing EKS Cluster (1.21 or later)
  • Terraform (1.0 or later) installed
  • kubectl configured to access your cluster
  • Helm 3.x installed
  • AWS CLI configured with appropriate credentials

Version Information

For this guide, we’ll use:

  • Karpenter: v1.8.3 (latest stable as of January 2026)
  • Kubernetes: 1.28+
  • Terraform: 1.5+

Architecture Overview

Karpenter’s architecture consists of several key components:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
┌─────────────────────────────────────────────────────┐
│                   EKS Cluster                       │
│  ┌──────────────┐         ┌──────────────────────┐ │
│  │  Karpenter   │         │   Unschedulable      │ │
│  │  Controller  │ watches │      Pods            │ │
│  └──────────────┘         └──────────────────────┘ │
│         │                                           │
│         │ provisions                                │
│         ▼                                           │
│  ┌──────────────────────────────────────────────┐  │
│  │          Karpenter Nodes                     │  │
│  │  (created based on NodePool requirements)    │  │
│  └──────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────┘
         │ AWS API calls
┌─────────────────────────────────────────────────────┐
│                   AWS Services                       │
│  • EC2 (instance provisioning)                      │
│  • IAM (role assumption via IRSA)                   │
│  • SQS (spot interruption handling - optional)      │
└─────────────────────────────────────────────────────┘

Key Concepts

  • NodePool: Defines the constraints and requirements for nodes (instance types, capacity type, availability zones)
  • EC2NodeClass: AWS-specific configuration (AMIs, subnets, security groups, IAM roles)
  • Consolidation: Automatic right-sizing and removal of underutilized nodes
  • Disruption Budget: Controls how aggressively Karpenter can terminate nodes

Step 1: Setting Up IAM Roles with Terraform

Karpenter requires two IAM roles:

  1. Karpenter Controller Role: Used by the Karpenter controller pods (via IRSA)
  2. Karpenter Node Role: Used by EC2 instances provisioned by Karpenter

1.1 Karpenter Controller IAM Role

First, create the IAM policy for the Karpenter controller:

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
# karpenter-controller-policy.tf

data "aws_iam_policy_document" "karpenter_controller" {
  statement {
    sid    = "AllowScopedEC2InstanceAccessActions"
    effect = "Allow"
    actions = [
      "ec2:RunInstances",
      "ec2:CreateFleet"
    ]
    resources = [
      "arn:aws:ec2:${var.region}::image/*",
      "arn:aws:ec2:${var.region}::snapshot/*",
      "arn:aws:ec2:${var.region}:*:security-group/*",
      "arn:aws:ec2:${var.region}:*:subnet/*"
    ]
  }

  statement {
    sid    = "AllowScopedEC2LaunchTemplateAccessActions"
    effect = "Allow"
    actions = [
      "ec2:RunInstances",
      "ec2:CreateFleet"
    ]
    resources = ["arn:aws:ec2:${var.region}:*:launch-template/*"]

    condition {
      test     = "StringEquals"
      variable = "aws:RequestedRegion"
      values   = [var.region]
    }
  }

  statement {
    sid    = "AllowScopedEC2InstanceActionsWithTags"
    effect = "Allow"
    actions = [
      "ec2:RunInstances",
      "ec2:CreateFleet",
      "ec2:CreateLaunchTemplate"
    ]
    resources = [
      "arn:aws:ec2:${var.region}:*:fleet/*",
      "arn:aws:ec2:${var.region}:*:instance/*",
      "arn:aws:ec2:${var.region}:*:volume/*",
      "arn:aws:ec2:${var.region}:*:network-interface/*",
      "arn:aws:ec2:${var.region}:*:launch-template/*",
      "arn:aws:ec2:${var.region}:*:spot-instances-request/*"
    ]

    condition {
      test     = "StringEquals"
      variable = "aws:RequestedRegion"
      values   = [var.region]
    }
  }

  statement {
    sid    = "AllowScopedResourceCreationTagging"
    effect = "Allow"
    actions = ["ec2:CreateTags"]
    resources = [
      "arn:aws:ec2:${var.region}:*:fleet/*",
      "arn:aws:ec2:${var.region}:*:instance/*",
      "arn:aws:ec2:${var.region}:*:volume/*",
      "arn:aws:ec2:${var.region}:*:network-interface/*",
      "arn:aws:ec2:${var.region}:*:launch-template/*",
      "arn:aws:ec2:${var.region}:*:spot-instances-request/*"
    ]

    condition {
      test     = "StringEquals"
      variable = "ec2:CreateAction"
      values = [
        "RunInstances",
        "CreateFleet",
        "CreateLaunchTemplate"
      ]
    }
  }

  statement {
    sid    = "AllowScopedDeletion"
    effect = "Allow"
    actions = [
      "ec2:TerminateInstances",
      "ec2:DeleteLaunchTemplate"
    ]
    resources = [
      "arn:aws:ec2:${var.region}:*:instance/*",
      "arn:aws:ec2:${var.region}:*:launch-template/*"
    ]

    condition {
      test     = "StringEquals"
      variable = "aws:RequestedRegion"
      values   = [var.region]
    }
  }

  statement {
    sid    = "AllowRegionalReadActions"
    effect = "Allow"
    actions = [
      "ec2:DescribeAvailabilityZones",
      "ec2:DescribeImages",
      "ec2:DescribeInstances",
      "ec2:DescribeInstanceTypeOfferings",
      "ec2:DescribeInstanceTypes",
      "ec2:DescribeLaunchTemplates",
      "ec2:DescribeSecurityGroups",
      "ec2:DescribeSpotPriceHistory",
      "ec2:DescribeSubnets"
    ]
    resources = ["*"]

    condition {
      test     = "StringEquals"
      variable = "aws:RequestedRegion"
      values   = [var.region]
    }
  }

  statement {
    sid       = "AllowSSMReadActions"
    effect    = "Allow"
    actions   = ["ssm:GetParameter"]
    resources = ["arn:aws:ssm:${var.region}::parameter/aws/service/*"]
  }

  statement {
    sid       = "AllowPricingReadActions"
    effect    = "Allow"
    actions   = ["pricing:GetProducts"]
    resources = ["*"]
  }

  statement {
    sid    = "AllowPassingInstanceRole"
    effect = "Allow"
    actions = ["iam:PassRole"]
    resources = [aws_iam_role.karpenter_node.arn]
  }

  statement {
    sid    = "AllowScopedInstanceProfileCreationActions"
    effect = "Allow"
    actions = ["iam:CreateInstanceProfile"]
    resources = ["*"]

    condition {
      test     = "StringEquals"
      variable = "aws:RequestedRegion"
      values   = [var.region]
    }
  }

  statement {
    sid    = "AllowScopedInstanceProfileTagActions"
    effect = "Allow"
    actions = ["iam:TagInstanceProfile"]
    resources = ["*"]

    condition {
      test     = "StringEquals"
      variable = "aws:RequestedRegion"
      values   = [var.region]
    }
  }

  statement {
    sid       = "AllowScopedInstanceProfileActions"
    effect    = "Allow"
    actions = [
      "iam:AddRoleToInstanceProfile",
      "iam:RemoveRoleFromInstanceProfile",
      "iam:DeleteInstanceProfile"
    ]
    resources = ["*"]

    condition {
      test     = "StringEquals"
      variable = "aws:RequestedRegion"
      values   = [var.region]
    }
  }

  statement {
    sid       = "AllowInstanceProfileReadActions"
    effect    = "Allow"
    actions   = ["iam:GetInstanceProfile"]
    resources = ["*"]
  }

  statement {
    sid       = "AllowAPIServerEndpointDiscovery"
    effect    = "Allow"
    actions   = ["eks:DescribeCluster"]
    resources = ["arn:aws:eks:${var.region}:${data.aws_caller_identity.current.account_id}:cluster/${var.cluster_name}"]
  }
}

resource "aws_iam_policy" "karpenter_controller" {
  name        = "KarpenterControllerPolicy-${var.cluster_name}"
  description = "IAM policy for Karpenter controller"
  policy      = data.aws_iam_policy_document.karpenter_controller.json
}

Create the IAM role for the controller using IRSA:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
# karpenter-controller-role.tf

data "aws_iam_policy_document" "karpenter_controller_assume_role" {
  statement {
    effect = "Allow"

    principals {
      type        = "Federated"
      identifiers = [var.oidc_provider_arn]
    }

    actions = ["sts:AssumeRoleWithWebIdentity"]

    condition {
      test     = "StringEquals"
      variable = "${replace(var.oidc_provider_arn, "/^(.*provider/)/", "")}:aud"
      values   = ["sts.amazonaws.com"]
    }

    condition {
      test     = "StringEquals"
      variable = "${replace(var.oidc_provider_arn, "/^(.*provider/)/", "")}:sub"
      values   = ["system:serviceaccount:karpenter:karpenter"]
    }
  }
}

resource "aws_iam_role" "karpenter_controller" {
  name               = "karpenter-controller-${var.cluster_name}"
  assume_role_policy = data.aws_iam_policy_document.karpenter_controller_assume_role.json
}

resource "aws_iam_role_policy_attachment" "karpenter_controller" {
  role       = aws_iam_role.karpenter_controller.name
  policy_arn = aws_iam_policy.karpenter_controller.arn
}

1.2 Karpenter Node IAM Role

Create the IAM role for nodes:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
# karpenter-node-role.tf

data "aws_iam_policy_document" "karpenter_node_assume_role" {
  statement {
    effect = "Allow"

    principals {
      type        = "Service"
      identifiers = ["ec2.amazonaws.com"]
    }

    actions = ["sts:AssumeRole"]
  }
}

resource "aws_iam_role" "karpenter_node" {
  name               = "karpenter-node-${var.cluster_name}"
  assume_role_policy = data.aws_iam_policy_document.karpenter_node_assume_role.json
}

# Attach required AWS managed policies
resource "aws_iam_role_policy_attachment" "karpenter_node_eks_worker" {
  role       = aws_iam_role.karpenter_node.name
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy"
}

resource "aws_iam_role_policy_attachment" "karpenter_node_eks_cni" {
  role       = aws_iam_role.karpenter_node.name
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy"
}

resource "aws_iam_role_policy_attachment" "karpenter_node_ecr_read" {
  role       = aws_iam_role.karpenter_node.name
  policy_arn = "arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly"
}

resource "aws_iam_role_policy_attachment" "karpenter_node_ssm" {
  role       = aws_iam_role.karpenter_node.name
  policy_arn = "arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore"
}

# Create instance profile
resource "aws_iam_instance_profile" "karpenter_node" {
  name = "karpenter-node-${var.cluster_name}"
  role = aws_iam_role.karpenter_node.name
}

1.3 Tagging Resources for Discovery

Karpenter uses tags to discover resources. Tag your VPC subnets and security groups:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
# tags.tf

resource "aws_ec2_tag" "subnet_tags" {
  for_each    = toset(var.private_subnet_ids)
  resource_id = each.value
  key         = "karpenter.sh/discovery"
  value       = var.cluster_name
}

resource "aws_ec2_tag" "security_group_tags" {
  for_each    = toset(var.node_security_group_ids)
  resource_id = each.value
  key         = "karpenter.sh/discovery"
  value       = var.cluster_name
}

1.4 Update aws-auth ConfigMap

Add the Karpenter node role to the aws-auth ConfigMap:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# aws-auth.tf

resource "kubernetes_config_map_v1_data" "aws_auth" {
  metadata {
    name      = "aws-auth"
    namespace = "kube-system"
  }

  data = {
    mapRoles = yamlencode(concat(
      var.existing_map_roles,
      [{
        rolearn  = aws_iam_role.karpenter_node.arn
        username = "system:node:{{EC2PrivateDNSName}}"
        groups   = [
          "system:bootstrappers",
          "system:nodes"
        ]
      }]
    ))
  }

  force = true
}

1.5 Variables and Outputs

Create variables file:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# variables.tf

variable "cluster_name" {
  description = "Name of the EKS cluster"
  type        = string
}

variable "region" {
  description = "AWS region"
  type        = string
}

variable "oidc_provider_arn" {
  description = "ARN of the OIDC provider for the EKS cluster"
  type        = string
}

variable "private_subnet_ids" {
  description = "List of private subnet IDs for Karpenter nodes"
  type        = list(string)
}

variable "node_security_group_ids" {
  description = "List of security group IDs for Karpenter nodes"
  type        = list(string)
}

And outputs:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
# outputs.tf

output "karpenter_controller_role_arn" {
  description = "ARN of the Karpenter controller IAM role"
  value       = aws_iam_role.karpenter_controller.arn
}

output "karpenter_node_role_name" {
  description = "Name of the Karpenter node IAM role"
  value       = aws_iam_role.karpenter_node.name
}

output "karpenter_node_instance_profile_name" {
  description = "Name of the Karpenter node instance profile"
  value       = aws_iam_instance_profile.karpenter_node.name
}

Apply the Terraform configuration:

1
2
3
terraform init
terraform plan
terraform apply

Step 2: Installing Karpenter Using Helm

2.1 Create Helm Values File

Create a values file for Karpenter:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
# karpenter-values.yaml

serviceAccount:
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::<ACCOUNT_ID>:role/karpenter-controller-<CLUSTER_NAME>

settings:
  # Cluster name for discovery
  clusterName: <CLUSTER_NAME>

  # Interruption queue for spot instance handling (optional but recommended)
  interruptionQueue: <CLUSTER_NAME>

# Controller resource requests/limits
controller:
  resources:
    requests:
      cpu: 1
      memory: 1Gi
    limits:
      cpu: 2
      memory: 2Gi

# Replica count (increase to 2 for production HA)
replicas: 1

# Webhook settings
webhook:
  enabled: true
  port: 8443

Replace <ACCOUNT_ID> and <CLUSTER_NAME> with your actual values.

2.2 Install Karpenter

Install Karpenter using Helm:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
# Add Karpenter Helm repository
helm repo add karpenter oci://public.ecr.aws/karpenter

# Update repositories
helm repo update

# Install Karpenter
helm upgrade --install karpenter karpenter/karpenter \
  --namespace karpenter \
  --create-namespace \
  --version 1.8.3 \
  --values karpenter-values.yaml \
  --wait

2.3 Verify Installation

Check that Karpenter is running:

1
2
kubectl get pods -n karpenter
kubectl logs -n karpenter -l app.kubernetes.io/name=karpenter --tail=50

You should see output similar to:

1
2
NAME                         READY   STATUS    RESTARTS   AGE
karpenter-xxxxxxxxxx-xxxxx   1/1     Running   0          1m

Step 3: Configuring EC2NodeClasses

EC2NodeClasses define AWS-specific configuration for nodes. Let’s create two: one for general workloads and one for GPU workloads.

3.1 General Workload EC2NodeClass

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
# ec2nodeclass-general.yaml

apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
  name: default
  namespace: karpenter
spec:
  # IAM role for nodes (created by Terraform)
  role: karpenter-node-<CLUSTER_NAME>

  # Subnet discovery using karpenter.sh/discovery tag
  subnetSelectorTerms:
    - tags:
        karpenter.sh/discovery: <CLUSTER_NAME>

  # Security group discovery using karpenter.sh/discovery tag
  securityGroupSelectorTerms:
    - tags:
        karpenter.sh/discovery: <CLUSTER_NAME>

  # AMI selection using EKS-optimized AMIs
  # Use latest AL2023 AMIs with automatic updates
  amiSelectorTerms:
    - alias: al2023@latest

  # User data for node initialization
  userData: |
    #!/bin/bash
    # Configure kubelet with custom settings
    echo "Running custom node initialization..."

    # Install any additional packages or configurations here
    # Example: yum install -y htop

  # Block device mappings for root volume
  blockDeviceMappings:
    - deviceName: /dev/xvda
      ebs:
        volumeSize: 100Gi
        volumeType: gp3
        iops: 3000
        throughput: 125
        encrypted: true
        deleteOnTermination: true

  # Metadata options for security
  metadataOptions:
    httpEndpoint: enabled
    httpProtocolIPv6: disabled
    httpPutResponseHopLimit: 2
    httpTokens: required # Require IMDSv2

  # Enable detailed CloudWatch monitoring
  detailedMonitoring: false

  # Tags to apply to all resources
  tags:
    karpenter.sh/discovery: <CLUSTER_NAME>
    environment: production
    managed-by: karpenter

3.2 GPU Workload EC2NodeClass

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
# ec2nodeclass-gpu.yaml

apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
  name: gpu
  namespace: karpenter
spec:
  role: karpenter-node-<CLUSTER_NAME>

  subnetSelectorTerms:
    - tags:
        karpenter.sh/discovery: <CLUSTER_NAME>

  securityGroupSelectorTerms:
    - tags:
        karpenter.sh/discovery: <CLUSTER_NAME>

  # Use GPU-optimized AMI
  amiSelectorTerms:
    - alias: al2023@latest

  # Custom user data for GPU nodes
  userData: |
    #!/bin/bash
    echo "Initializing GPU node..."

    # NVIDIA drivers are included in EKS GPU AMIs
    # Additional GPU-specific configuration can go here

  blockDeviceMappings:
    - deviceName: /dev/xvda
      ebs:
        volumeSize: 200Gi # Larger for GPU workloads
        volumeType: gp3
        iops: 3000
        throughput: 125
        encrypted: true
        deleteOnTermination: true

  metadataOptions:
    httpEndpoint: enabled
    httpProtocolIPv6: disabled
    httpPutResponseHopLimit: 2
    httpTokens: required

  detailedMonitoring: true # Enable for GPU nodes

  tags:
    karpenter.sh/discovery: <CLUSTER_NAME>
    environment: production
    workload-type: gpu
    managed-by: karpenter

Apply the EC2NodeClasses:

1
2
3
4
5
kubectl apply -f ec2nodeclass-general.yaml
kubectl apply -f ec2nodeclass-gpu.yaml

# Verify
kubectl get ec2nodeclasses -n karpenter

Step 4: Creating NodePools

NodePools define the constraints and requirements for nodes that Karpenter will provision.

4.1 General Workload NodePool

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
# nodepool-general.yaml

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: general
  namespace: karpenter
spec:
  # Template for nodes created by this NodePool
  template:
    metadata:
      labels:
        workload-type: general

    spec:
      # Reference to the default EC2NodeClass
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: default

      # Requirements for general workload nodes
      requirements:
        # Architecture - amd64 for compatibility
        - key: kubernetes.io/arch
          operator: In
          values: ["amd64"]

        # Operating System
        - key: kubernetes.io/os
          operator: In
          values: ["linux"]

        # Capacity type - prefer spot instances for cost savings
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot", "on-demand"]

        # Instance types - diversified for better spot availability
        - key: node.kubernetes.io/instance-type
          operator: In
          values:
            # T3a family (burstable, cost-effective)
            - t3a.medium
            - t3a.large
            - t3a.xlarge
            # M6a family (balanced compute/memory)
            - m6a.large
            - m6a.xlarge
            - m6a.2xlarge
            # M6i family (latest generation)
            - m6i.large
            - m6i.xlarge
            - m6i.2xlarge
            # M5 family (previous generation, often cheaper)
            - m5.large
            - m5.xlarge
            - m5.2xlarge
            # C6a family (compute optimized)
            - c6a.large
            - c6a.xlarge
            - c6a.2xlarge

        # Availability zones
        - key: topology.kubernetes.io/zone
          operator: In
          values:
            - us-west-2a
            - us-west-2b
            - us-west-2c

      # Node expiration - rotate nodes every 7 days for security updates
      expireAfter: 168h # 7 days

  # Resource limits for this NodePool
  limits:
    cpu: "500"
    memory: 2000Gi

  # Disruption budget for general nodes
  disruption:
    # Consolidate when nodes are empty OR underutilized for cost optimization
    consolidationPolicy: WhenEmptyOrUnderutilized

    # Wait 30 seconds before consolidating (fast scale-down)
    consolidateAfter: 30s

    # Disruption budgets - allow 10% of nodes to be disrupted at any time
    budgets:
      - nodes: "10%"

  # Weight for prioritization (higher number = higher priority)
  weight: 10

4.2 GPU Workload NodePool

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
# nodepool-gpu.yaml

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: gpu
  namespace: karpenter
spec:
  template:
    metadata:
      labels:
        workload-type: gpu

    spec:
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: gpu

      # Taints to prevent non-GPU workloads from scheduling
      taints:
        - key: nvidia.com/gpu
          value: "true"
          effect: NoSchedule

      requirements:
        - key: kubernetes.io/arch
          operator: In
          values: ["amd64"]

        - key: kubernetes.io/os
          operator: In
          values: ["linux"]

        # On-demand only for GPU (more reliable than Spot)
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["on-demand"]

        # GPU instance types (g4dn family)
        - key: node.kubernetes.io/instance-type
          operator: In
          values:
            - g4dn.xlarge # 1 GPU, 4 vCPUs, 16 GB
            - g4dn.2xlarge # 1 GPU, 8 vCPUs, 32 GB
            - g4dn.4xlarge # 1 GPU, 16 vCPUs, 64 GB
            - g4dn.8xlarge # 1 GPU, 32 vCPUs, 128 GB
            - g4dn.12xlarge # 4 GPUs, 48 vCPUs, 192 GB

        - key: topology.kubernetes.io/zone
          operator: In
          values:
            - us-west-2a
            - us-west-2b
            - us-west-2c

      # Longer expiration for GPU nodes (more expensive to churn)
      expireAfter: 720h # 30 days

  limits:
    cpu: "200"
    memory: 800Gi

  disruption:
    # Only consolidate when empty (preserve running GPU workloads)
    consolidationPolicy: WhenEmpty

    # Wait longer before consolidating GPU nodes
    consolidateAfter: 300s # 5 minutes

    budgets:
      - nodes: "0" # No automatic disruption for GPU nodes

  # Lower weight than general (lower priority)
  weight: 5

Apply the NodePools:

1
2
3
4
5
kubectl apply -f nodepool-general.yaml
kubectl apply -f nodepool-gpu.yaml

# Verify
kubectl get nodepools -n karpenter

Expected output:

1
2
3
NAME      READY   AGE
general   True    5s
gpu       True    5s

Step 5: Testing and Validation

Now let’s test that Karpenter is working correctly.

5.1 Test General Workload Scaling

Create a test deployment:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# test-general-workload.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: karpenter-test-general
  namespace: default
spec:
  replicas: 5
  selector:
    matchLabels:
      app: karpenter-test-general
  template:
    metadata:
      labels:
        app: karpenter-test-general
    spec:
      # Node selector to target general NodePool
      nodeSelector:
        workload-type: general

      containers:
        - name: nginx
          image: nginx:latest
          resources:
            requests:
              cpu: "1"
              memory: "2Gi"
            limits:
              cpu: "1"
              memory: "2Gi"

Deploy and watch Karpenter provision nodes:

1
2
3
4
5
6
7
kubectl apply -f test-general-workload.yaml

# Watch nodes being created
kubectl get nodes -l karpenter.sh/nodepool -w

# Check Karpenter logs
kubectl logs -n karpenter -l app.kubernetes.io/name=karpenter -f

You should see nodes appear within 30-60 seconds.

5.2 Test GPU Workload Scaling

Create a GPU test deployment:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
# test-gpu-workload.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: karpenter-test-gpu
  namespace: default
spec:
  replicas: 2
  selector:
    matchLabels:
      app: karpenter-test-gpu
  template:
    metadata:
      labels:
        app: karpenter-test-gpu
    spec:
      nodeSelector:
        workload-type: gpu

      # Tolerate the GPU taint
      tolerations:
        - key: nvidia.com/gpu
          operator: Equal
          value: "true"
          effect: NoSchedule

      containers:
        - name: cuda-test
          image: nvidia/cuda:11.8.0-base-ubuntu22.04
          command: ["sleep", "infinity"]
          resources:
            requests:
              nvidia.com/gpu: "1"
              cpu: "2"
              memory: "8Gi"
            limits:
              nvidia.com/gpu: "1"
              cpu: "2"
              memory: "8Gi"

Deploy and verify:

1
2
kubectl apply -f test-gpu-workload.yaml
kubectl get nodes -l workload-type=gpu -w

5.3 Test Consolidation

Delete the test workloads and watch Karpenter terminate unused nodes:

1
2
3
4
5
kubectl delete -f test-general-workload.yaml
kubectl delete -f test-gpu-workload.yaml

# Watch nodes being terminated
kubectl get nodes -l karpenter.sh/nodepool -w

Nodes should be terminated within 30 seconds to 5 minutes depending on the consolidation policy.

5.4 Automated Validation Script

Create a comprehensive validation script:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
#!/bin/bash
# karpenter-validation.sh

set -e

GREEN='\033[0;32m'
RED='\033[0;31m'
NC='\033[0m'

echo -e "${GREEN}[1/6] Checking Karpenter controller...${NC}"
kubectl get deployment -n karpenter karpenter
kubectl wait --for=condition=available --timeout=60s deployment/karpenter -n karpenter

echo -e "${GREEN}[2/6] Checking NodePools...${NC}"
kubectl get nodepools -n karpenter

echo -e "${GREEN}[3/6] Checking EC2NodeClasses...${NC}"
kubectl get ec2nodeclasses -n karpenter

echo -e "${GREEN}[4/6] Deploying test workload...${NC}"
kubectl apply -f test-general-workload.yaml

echo -e "${GREEN}[5/6] Waiting for nodes to be provisioned (max 5 min)...${NC}"
timeout 300 bash -c 'until [ $(kubectl get nodes -l karpenter.sh/nodepool --no-headers | wc -l) -gt 0 ]; do sleep 5; done'

NODES=$(kubectl get nodes -l karpenter.sh/nodepool --no-headers | wc -l)
echo -e "${GREEN}✓ Karpenter provisioned $NODES node(s)${NC}"
kubectl get nodes -l karpenter.sh/nodepool

echo -e "${GREEN}[6/6] Cleaning up...${NC}"
kubectl delete -f test-general-workload.yaml

echo -e "${GREEN}✓ Validation complete!${NC}"

Run the validation:

1
2
chmod +x karpenter-validation.sh
./karpenter-validation.sh

Monitoring and Troubleshooting

Key Commands for Monitoring

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# View NodePool status
kubectl get nodepools -n karpenter

# View provisioned nodes
kubectl get nodes -l karpenter.sh/nodepool

# Check controller logs
kubectl logs -n karpenter -l app.kubernetes.io/name=karpenter -f

# View events
kubectl get events -n karpenter --sort-by='.lastTimestamp'

# View detailed NodePool information
kubectl describe nodepool general -n karpenter

Common Issues

1. Nodes Not Provisioning

Check Karpenter logs:

1
kubectl logs -n karpenter -l app.kubernetes.io/name=karpenter --tail=100

Common causes:

  • IAM permissions missing → Verify controller policy
  • Subnet capacity exhausted → Check subnet available IPs
  • Security groups misconfigured → Verify discovery tags
  • Instance type unavailable → Check AWS service quotas

2. Karpenter Controller CrashLooping

1
2
kubectl get sa karpenter -n karpenter -o yaml
kubectl describe pod -n karpenter -l app.kubernetes.io/name=karpenter

Common causes:

  • IRSA misconfiguration → Verify IAM role ARN annotation
  • OIDC provider not configured → Check EKS cluster OIDC
  • Webhook certificate issues → Restart controller

3. Nodes Not Terminating

1
2
kubectl get pdb -A
kubectl get nodepools -n karpenter -o yaml

Common causes:

  • PodDisruptionBudgets too restrictive
  • Consolidation policy too conservative
  • do-not-disrupt annotation on pods

Prometheus Metrics

Karpenter exposes metrics on port 8080:

1
2
kubectl port-forward -n karpenter svc/karpenter 8080:8080
curl http://localhost:8080/metrics | grep karpenter

Key metrics to monitor:

  • karpenter_nodes_created_total: Total nodes created
  • karpenter_nodes_terminated_total: Total nodes terminated
  • karpenter_provisioner_scheduling_duration_seconds: Time to provision nodes
  • karpenter_cloudprovider_instance_type_cpu_cores: CPU capacity by instance type

Best Practices and Cost Optimization

1. Instance Type Diversity

Allow Karpenter to choose from a wide range of instance types for better Spot availability and cost optimization:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
requirements:
  - key: node.kubernetes.io/instance-type
    operator: In
    values:
      - t3a.large
      - t3a.xlarge
      - m6a.large
      - m6a.xlarge
      - m6i.large
      - m6i.xlarge
      # Add 10-15 instance types for best results

2. Use Spot Instances Strategically

  • General workloads: 70-80% Spot, 20-30% On-Demand
  • Stateful workloads: 100% On-Demand
  • GPU workloads: 100% On-Demand (Spot GPU can be expensive)

3. Set Resource Limits

Prevent runaway costs by setting limits:

1
2
3
limits:
  cpu: "500"
  memory: 2000Gi

4. Enable Consolidation

Aggressive consolidation saves money:

1
2
3
disruption:
  consolidationPolicy: WhenEmptyOrUnderutilized
  consolidateAfter: 30s

5. Separate NodePools by Workload Characteristics

Create separate NodePools for:

  • CPU-intensive workloads (compute-optimized instances)
  • Memory-intensive workloads (memory-optimized instances)
  • GPU workloads (GPU instances)
  • Burstable workloads (T-family instances)

6. Use Mutually Exclusive NodePools

Ensure NodePools don’t overlap to avoid random selection:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# NodePool 1: Only Spot
requirements:
  - key: karpenter.sh/capacity-type
    operator: In
    values: ["spot"]

# NodePool 2: Only On-Demand
requirements:
  - key: karpenter.sh/capacity-type
    operator: In
    values: ["on-demand"]

7. Pin AMI Versions in Production

For production, pin specific AMI versions to avoid unexpected updates:

1
2
amiSelectorTerms:
  - id: ami-0123456789abcdef0 # Specific AMI ID

8. Implement Spot Interruption Handling

Set up an SQS queue for graceful Spot interruption handling:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
# Create SQS queue (via Terraform)
resource "aws_sqs_queue" "karpenter_interruption" {
  name = var.cluster_name
}

# Subscribe to EC2 Spot interruption events
resource "aws_cloudwatch_event_rule" "spot_interruption" {
  name = "${var.cluster_name}-spot-interruption"
  event_pattern = jsonencode({
    source      = ["aws.ec2"]
    detail-type = ["EC2 Spot Instance Interruption Warning"]
  })
}

Expected Cost Savings

Based on typical workload distributions:

Workload Type% of ClusterSavings vs Managed Node Groups
General (Spot)70%60-70%
General (On-demand)10%20-30%
GPU (On-demand)20%20-30%
Overall100%40-60%

For a cluster with $100,000 annual compute costs, expect savings of $40,000-$60,000 per year.

Conclusion

Karpenter represents a significant evolution in Kubernetes autoscaling, offering faster provisioning times, better cost optimization, and simpler configuration compared to the traditional Cluster Autoscaler.

Key takeaways:

  • Setup: Use Terraform for IAM roles, Helm for Karpenter installation
  • Configuration: Define NodePools for different workload types, use EC2NodeClasses for AWS-specific settings
  • Cost Optimization: Leverage Spot instances, enable consolidation, use instance diversity
  • Monitoring: Watch controller logs, track metrics, set up alerts
  • Testing: Validate with test workloads before migrating production traffic

By following this guide, you should have a fully functional Karpenter setup that can provision nodes in under 60 seconds and reduce your EKS compute costs by 40-60%.

Sources

comments powered by Disqus
Citizix Ltd
Built with Hugo
Theme Stack designed by Jimmy