Setting Up Autoscaling with Karpenter on Amazon EKS: A Complete Guide
Kubernetes cluster autoscaling has evolved significantly over the years. If you’re still using the traditional Cluster Autoscaler, you’re likely experiencing slow scaling times (3-5 minutes) and the operational overhead of managing multiple node groups. Enter Karpenter - a flexible, high-performance Kubernetes cluster autoscaler that can provision right-sized compute resources in under 60 seconds.
In this comprehensive guide, I’ll walk you through setting up Karpenter on Amazon EKS using Terraform for infrastructure management, from the basics to a fully operational autoscaling solution.
Table of Contents
- What is Karpenter?
- Prerequisites
- Architecture Overview
- Step 1: Setting Up IAM Roles with Terraform
- Step 2: Installing Karpenter Using Helm
- Step 3: Configuring EC2NodeClasses
- Step 4: Creating NodePools
- Step 5: Testing and Validation
- Monitoring and Troubleshooting
- Best Practices and Cost Optimization
What is Karpenter?
Karpenter is an open-source Kubernetes cluster autoscaler built by AWS that dramatically improves upon the traditional Cluster Autoscaler. Here’s why it’s a game-changer:
- Fast Provisioning: Nodes appear in 30-60 seconds vs 3-5 minutes with Cluster Autoscaler
- Right-Sizing: Automatically selects optimal instance types based on actual pod requirements
- Cost Optimization: Uses Spot instances and consolidates underutilized nodes
- Simplified Configuration: Single NodePool instead of multiple node groups
- Instance Diversity: Selects from a wide range of instance types for better Spot availability
According to recent performance benchmarks, organizations using Karpenter have seen 40-60% cost reduction compared to managed node groups with Cluster Autoscaler.
Prerequisites
Before we begin, ensure you have:
- AWS Account with appropriate permissions
- Existing EKS Cluster (1.21 or later)
- Terraform (1.0 or later) installed
- kubectl configured to access your cluster
- Helm 3.x installed
- AWS CLI configured with appropriate credentials
For this guide, we’ll use:
- Karpenter: v1.8.3 (latest stable as of January 2026)
- Kubernetes: 1.28+
- Terraform: 1.5+
Architecture Overview
Karpenter’s architecture consists of several key components:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
| ┌─────────────────────────────────────────────────────┐
│ EKS Cluster │
│ ┌──────────────┐ ┌──────────────────────┐ │
│ │ Karpenter │ │ Unschedulable │ │
│ │ Controller │ watches │ Pods │ │
│ └──────────────┘ └──────────────────────┘ │
│ │ │
│ │ provisions │
│ ▼ │
│ ┌──────────────────────────────────────────────┐ │
│ │ Karpenter Nodes │ │
│ │ (created based on NodePool requirements) │ │
│ └──────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────┘
│
│ AWS API calls
▼
┌─────────────────────────────────────────────────────┐
│ AWS Services │
│ • EC2 (instance provisioning) │
│ • IAM (role assumption via IRSA) │
│ • SQS (spot interruption handling - optional) │
└─────────────────────────────────────────────────────┘
|
Key Concepts
- NodePool: Defines the constraints and requirements for nodes (instance types, capacity type, availability zones)
- EC2NodeClass: AWS-specific configuration (AMIs, subnets, security groups, IAM roles)
- Consolidation: Automatic right-sizing and removal of underutilized nodes
- Disruption Budget: Controls how aggressively Karpenter can terminate nodes
Step 1: Setting Up IAM Roles with Terraform
Karpenter requires two IAM roles:
- Karpenter Controller Role: Used by the Karpenter controller pods (via IRSA)
- Karpenter Node Role: Used by EC2 instances provisioned by Karpenter
1.1 Karpenter Controller IAM Role
First, create the IAM policy for the Karpenter controller:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
| # karpenter-controller-policy.tf
data "aws_iam_policy_document" "karpenter_controller" {
statement {
sid = "AllowScopedEC2InstanceAccessActions"
effect = "Allow"
actions = [
"ec2:RunInstances",
"ec2:CreateFleet"
]
resources = [
"arn:aws:ec2:${var.region}::image/*",
"arn:aws:ec2:${var.region}::snapshot/*",
"arn:aws:ec2:${var.region}:*:security-group/*",
"arn:aws:ec2:${var.region}:*:subnet/*"
]
}
statement {
sid = "AllowScopedEC2LaunchTemplateAccessActions"
effect = "Allow"
actions = [
"ec2:RunInstances",
"ec2:CreateFleet"
]
resources = ["arn:aws:ec2:${var.region}:*:launch-template/*"]
condition {
test = "StringEquals"
variable = "aws:RequestedRegion"
values = [var.region]
}
}
statement {
sid = "AllowScopedEC2InstanceActionsWithTags"
effect = "Allow"
actions = [
"ec2:RunInstances",
"ec2:CreateFleet",
"ec2:CreateLaunchTemplate"
]
resources = [
"arn:aws:ec2:${var.region}:*:fleet/*",
"arn:aws:ec2:${var.region}:*:instance/*",
"arn:aws:ec2:${var.region}:*:volume/*",
"arn:aws:ec2:${var.region}:*:network-interface/*",
"arn:aws:ec2:${var.region}:*:launch-template/*",
"arn:aws:ec2:${var.region}:*:spot-instances-request/*"
]
condition {
test = "StringEquals"
variable = "aws:RequestedRegion"
values = [var.region]
}
}
statement {
sid = "AllowScopedResourceCreationTagging"
effect = "Allow"
actions = ["ec2:CreateTags"]
resources = [
"arn:aws:ec2:${var.region}:*:fleet/*",
"arn:aws:ec2:${var.region}:*:instance/*",
"arn:aws:ec2:${var.region}:*:volume/*",
"arn:aws:ec2:${var.region}:*:network-interface/*",
"arn:aws:ec2:${var.region}:*:launch-template/*",
"arn:aws:ec2:${var.region}:*:spot-instances-request/*"
]
condition {
test = "StringEquals"
variable = "ec2:CreateAction"
values = [
"RunInstances",
"CreateFleet",
"CreateLaunchTemplate"
]
}
}
statement {
sid = "AllowScopedDeletion"
effect = "Allow"
actions = [
"ec2:TerminateInstances",
"ec2:DeleteLaunchTemplate"
]
resources = [
"arn:aws:ec2:${var.region}:*:instance/*",
"arn:aws:ec2:${var.region}:*:launch-template/*"
]
condition {
test = "StringEquals"
variable = "aws:RequestedRegion"
values = [var.region]
}
}
statement {
sid = "AllowRegionalReadActions"
effect = "Allow"
actions = [
"ec2:DescribeAvailabilityZones",
"ec2:DescribeImages",
"ec2:DescribeInstances",
"ec2:DescribeInstanceTypeOfferings",
"ec2:DescribeInstanceTypes",
"ec2:DescribeLaunchTemplates",
"ec2:DescribeSecurityGroups",
"ec2:DescribeSpotPriceHistory",
"ec2:DescribeSubnets"
]
resources = ["*"]
condition {
test = "StringEquals"
variable = "aws:RequestedRegion"
values = [var.region]
}
}
statement {
sid = "AllowSSMReadActions"
effect = "Allow"
actions = ["ssm:GetParameter"]
resources = ["arn:aws:ssm:${var.region}::parameter/aws/service/*"]
}
statement {
sid = "AllowPricingReadActions"
effect = "Allow"
actions = ["pricing:GetProducts"]
resources = ["*"]
}
statement {
sid = "AllowPassingInstanceRole"
effect = "Allow"
actions = ["iam:PassRole"]
resources = [aws_iam_role.karpenter_node.arn]
}
statement {
sid = "AllowScopedInstanceProfileCreationActions"
effect = "Allow"
actions = ["iam:CreateInstanceProfile"]
resources = ["*"]
condition {
test = "StringEquals"
variable = "aws:RequestedRegion"
values = [var.region]
}
}
statement {
sid = "AllowScopedInstanceProfileTagActions"
effect = "Allow"
actions = ["iam:TagInstanceProfile"]
resources = ["*"]
condition {
test = "StringEquals"
variable = "aws:RequestedRegion"
values = [var.region]
}
}
statement {
sid = "AllowScopedInstanceProfileActions"
effect = "Allow"
actions = [
"iam:AddRoleToInstanceProfile",
"iam:RemoveRoleFromInstanceProfile",
"iam:DeleteInstanceProfile"
]
resources = ["*"]
condition {
test = "StringEquals"
variable = "aws:RequestedRegion"
values = [var.region]
}
}
statement {
sid = "AllowInstanceProfileReadActions"
effect = "Allow"
actions = ["iam:GetInstanceProfile"]
resources = ["*"]
}
statement {
sid = "AllowAPIServerEndpointDiscovery"
effect = "Allow"
actions = ["eks:DescribeCluster"]
resources = ["arn:aws:eks:${var.region}:${data.aws_caller_identity.current.account_id}:cluster/${var.cluster_name}"]
}
}
resource "aws_iam_policy" "karpenter_controller" {
name = "KarpenterControllerPolicy-${var.cluster_name}"
description = "IAM policy for Karpenter controller"
policy = data.aws_iam_policy_document.karpenter_controller.json
}
|
Create the IAM role for the controller using IRSA:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
| # karpenter-controller-role.tf
data "aws_iam_policy_document" "karpenter_controller_assume_role" {
statement {
effect = "Allow"
principals {
type = "Federated"
identifiers = [var.oidc_provider_arn]
}
actions = ["sts:AssumeRoleWithWebIdentity"]
condition {
test = "StringEquals"
variable = "${replace(var.oidc_provider_arn, "/^(.*provider/)/", "")}:aud"
values = ["sts.amazonaws.com"]
}
condition {
test = "StringEquals"
variable = "${replace(var.oidc_provider_arn, "/^(.*provider/)/", "")}:sub"
values = ["system:serviceaccount:karpenter:karpenter"]
}
}
}
resource "aws_iam_role" "karpenter_controller" {
name = "karpenter-controller-${var.cluster_name}"
assume_role_policy = data.aws_iam_policy_document.karpenter_controller_assume_role.json
}
resource "aws_iam_role_policy_attachment" "karpenter_controller" {
role = aws_iam_role.karpenter_controller.name
policy_arn = aws_iam_policy.karpenter_controller.arn
}
|
1.2 Karpenter Node IAM Role
Create the IAM role for nodes:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
| # karpenter-node-role.tf
data "aws_iam_policy_document" "karpenter_node_assume_role" {
statement {
effect = "Allow"
principals {
type = "Service"
identifiers = ["ec2.amazonaws.com"]
}
actions = ["sts:AssumeRole"]
}
}
resource "aws_iam_role" "karpenter_node" {
name = "karpenter-node-${var.cluster_name}"
assume_role_policy = data.aws_iam_policy_document.karpenter_node_assume_role.json
}
# Attach required AWS managed policies
resource "aws_iam_role_policy_attachment" "karpenter_node_eks_worker" {
role = aws_iam_role.karpenter_node.name
policy_arn = "arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy"
}
resource "aws_iam_role_policy_attachment" "karpenter_node_eks_cni" {
role = aws_iam_role.karpenter_node.name
policy_arn = "arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy"
}
resource "aws_iam_role_policy_attachment" "karpenter_node_ecr_read" {
role = aws_iam_role.karpenter_node.name
policy_arn = "arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly"
}
resource "aws_iam_role_policy_attachment" "karpenter_node_ssm" {
role = aws_iam_role.karpenter_node.name
policy_arn = "arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore"
}
# Create instance profile
resource "aws_iam_instance_profile" "karpenter_node" {
name = "karpenter-node-${var.cluster_name}"
role = aws_iam_role.karpenter_node.name
}
|
1.3 Tagging Resources for Discovery
Karpenter uses tags to discover resources. Tag your VPC subnets and security groups:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
| # tags.tf
resource "aws_ec2_tag" "subnet_tags" {
for_each = toset(var.private_subnet_ids)
resource_id = each.value
key = "karpenter.sh/discovery"
value = var.cluster_name
}
resource "aws_ec2_tag" "security_group_tags" {
for_each = toset(var.node_security_group_ids)
resource_id = each.value
key = "karpenter.sh/discovery"
value = var.cluster_name
}
|
1.4 Update aws-auth ConfigMap
Add the Karpenter node role to the aws-auth ConfigMap:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
| # aws-auth.tf
resource "kubernetes_config_map_v1_data" "aws_auth" {
metadata {
name = "aws-auth"
namespace = "kube-system"
}
data = {
mapRoles = yamlencode(concat(
var.existing_map_roles,
[{
rolearn = aws_iam_role.karpenter_node.arn
username = "system:node:{{EC2PrivateDNSName}}"
groups = [
"system:bootstrappers",
"system:nodes"
]
}]
))
}
force = true
}
|
1.5 Variables and Outputs
Create variables file:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
| # variables.tf
variable "cluster_name" {
description = "Name of the EKS cluster"
type = string
}
variable "region" {
description = "AWS region"
type = string
}
variable "oidc_provider_arn" {
description = "ARN of the OIDC provider for the EKS cluster"
type = string
}
variable "private_subnet_ids" {
description = "List of private subnet IDs for Karpenter nodes"
type = list(string)
}
variable "node_security_group_ids" {
description = "List of security group IDs for Karpenter nodes"
type = list(string)
}
|
And outputs:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
| # outputs.tf
output "karpenter_controller_role_arn" {
description = "ARN of the Karpenter controller IAM role"
value = aws_iam_role.karpenter_controller.arn
}
output "karpenter_node_role_name" {
description = "Name of the Karpenter node IAM role"
value = aws_iam_role.karpenter_node.name
}
output "karpenter_node_instance_profile_name" {
description = "Name of the Karpenter node instance profile"
value = aws_iam_instance_profile.karpenter_node.name
}
|
Apply the Terraform configuration:
1
2
3
| terraform init
terraform plan
terraform apply
|
Step 2: Installing Karpenter Using Helm
2.1 Create Helm Values File
Create a values file for Karpenter:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
| # karpenter-values.yaml
serviceAccount:
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::<ACCOUNT_ID>:role/karpenter-controller-<CLUSTER_NAME>
settings:
# Cluster name for discovery
clusterName: <CLUSTER_NAME>
# Interruption queue for spot instance handling (optional but recommended)
interruptionQueue: <CLUSTER_NAME>
# Controller resource requests/limits
controller:
resources:
requests:
cpu: 1
memory: 1Gi
limits:
cpu: 2
memory: 2Gi
# Replica count (increase to 2 for production HA)
replicas: 1
# Webhook settings
webhook:
enabled: true
port: 8443
|
Replace <ACCOUNT_ID> and <CLUSTER_NAME> with your actual values.
2.2 Install Karpenter
Install Karpenter using Helm:
1
2
3
4
5
6
7
8
9
10
11
12
13
| # Add Karpenter Helm repository
helm repo add karpenter oci://public.ecr.aws/karpenter
# Update repositories
helm repo update
# Install Karpenter
helm upgrade --install karpenter karpenter/karpenter \
--namespace karpenter \
--create-namespace \
--version 1.8.3 \
--values karpenter-values.yaml \
--wait
|
2.3 Verify Installation
Check that Karpenter is running:
1
2
| kubectl get pods -n karpenter
kubectl logs -n karpenter -l app.kubernetes.io/name=karpenter --tail=50
|
You should see output similar to:
1
2
| NAME READY STATUS RESTARTS AGE
karpenter-xxxxxxxxxx-xxxxx 1/1 Running 0 1m
|
Step 3: Configuring EC2NodeClasses
EC2NodeClasses define AWS-specific configuration for nodes. Let’s create two: one for general workloads and one for GPU workloads.
3.1 General Workload EC2NodeClass
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
| # ec2nodeclass-general.yaml
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
name: default
namespace: karpenter
spec:
# IAM role for nodes (created by Terraform)
role: karpenter-node-<CLUSTER_NAME>
# Subnet discovery using karpenter.sh/discovery tag
subnetSelectorTerms:
- tags:
karpenter.sh/discovery: <CLUSTER_NAME>
# Security group discovery using karpenter.sh/discovery tag
securityGroupSelectorTerms:
- tags:
karpenter.sh/discovery: <CLUSTER_NAME>
# AMI selection using EKS-optimized AMIs
# Use latest AL2023 AMIs with automatic updates
amiSelectorTerms:
- alias: al2023@latest
# User data for node initialization
userData: |
#!/bin/bash
# Configure kubelet with custom settings
echo "Running custom node initialization..."
# Install any additional packages or configurations here
# Example: yum install -y htop
# Block device mappings for root volume
blockDeviceMappings:
- deviceName: /dev/xvda
ebs:
volumeSize: 100Gi
volumeType: gp3
iops: 3000
throughput: 125
encrypted: true
deleteOnTermination: true
# Metadata options for security
metadataOptions:
httpEndpoint: enabled
httpProtocolIPv6: disabled
httpPutResponseHopLimit: 2
httpTokens: required # Require IMDSv2
# Enable detailed CloudWatch monitoring
detailedMonitoring: false
# Tags to apply to all resources
tags:
karpenter.sh/discovery: <CLUSTER_NAME>
environment: production
managed-by: karpenter
|
3.2 GPU Workload EC2NodeClass
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
| # ec2nodeclass-gpu.yaml
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
name: gpu
namespace: karpenter
spec:
role: karpenter-node-<CLUSTER_NAME>
subnetSelectorTerms:
- tags:
karpenter.sh/discovery: <CLUSTER_NAME>
securityGroupSelectorTerms:
- tags:
karpenter.sh/discovery: <CLUSTER_NAME>
# Use GPU-optimized AMI
amiSelectorTerms:
- alias: al2023@latest
# Custom user data for GPU nodes
userData: |
#!/bin/bash
echo "Initializing GPU node..."
# NVIDIA drivers are included in EKS GPU AMIs
# Additional GPU-specific configuration can go here
blockDeviceMappings:
- deviceName: /dev/xvda
ebs:
volumeSize: 200Gi # Larger for GPU workloads
volumeType: gp3
iops: 3000
throughput: 125
encrypted: true
deleteOnTermination: true
metadataOptions:
httpEndpoint: enabled
httpProtocolIPv6: disabled
httpPutResponseHopLimit: 2
httpTokens: required
detailedMonitoring: true # Enable for GPU nodes
tags:
karpenter.sh/discovery: <CLUSTER_NAME>
environment: production
workload-type: gpu
managed-by: karpenter
|
Apply the EC2NodeClasses:
1
2
3
4
5
| kubectl apply -f ec2nodeclass-general.yaml
kubectl apply -f ec2nodeclass-gpu.yaml
# Verify
kubectl get ec2nodeclasses -n karpenter
|
Step 4: Creating NodePools
NodePools define the constraints and requirements for nodes that Karpenter will provision.
4.1 General Workload NodePool
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
| # nodepool-general.yaml
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: general
namespace: karpenter
spec:
# Template for nodes created by this NodePool
template:
metadata:
labels:
workload-type: general
spec:
# Reference to the default EC2NodeClass
nodeClassRef:
group: karpenter.k8s.aws
kind: EC2NodeClass
name: default
# Requirements for general workload nodes
requirements:
# Architecture - amd64 for compatibility
- key: kubernetes.io/arch
operator: In
values: ["amd64"]
# Operating System
- key: kubernetes.io/os
operator: In
values: ["linux"]
# Capacity type - prefer spot instances for cost savings
- key: karpenter.sh/capacity-type
operator: In
values: ["spot", "on-demand"]
# Instance types - diversified for better spot availability
- key: node.kubernetes.io/instance-type
operator: In
values:
# T3a family (burstable, cost-effective)
- t3a.medium
- t3a.large
- t3a.xlarge
# M6a family (balanced compute/memory)
- m6a.large
- m6a.xlarge
- m6a.2xlarge
# M6i family (latest generation)
- m6i.large
- m6i.xlarge
- m6i.2xlarge
# M5 family (previous generation, often cheaper)
- m5.large
- m5.xlarge
- m5.2xlarge
# C6a family (compute optimized)
- c6a.large
- c6a.xlarge
- c6a.2xlarge
# Availability zones
- key: topology.kubernetes.io/zone
operator: In
values:
- us-west-2a
- us-west-2b
- us-west-2c
# Node expiration - rotate nodes every 7 days for security updates
expireAfter: 168h # 7 days
# Resource limits for this NodePool
limits:
cpu: "500"
memory: 2000Gi
# Disruption budget for general nodes
disruption:
# Consolidate when nodes are empty OR underutilized for cost optimization
consolidationPolicy: WhenEmptyOrUnderutilized
# Wait 30 seconds before consolidating (fast scale-down)
consolidateAfter: 30s
# Disruption budgets - allow 10% of nodes to be disrupted at any time
budgets:
- nodes: "10%"
# Weight for prioritization (higher number = higher priority)
weight: 10
|
4.2 GPU Workload NodePool
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
| # nodepool-gpu.yaml
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: gpu
namespace: karpenter
spec:
template:
metadata:
labels:
workload-type: gpu
spec:
nodeClassRef:
group: karpenter.k8s.aws
kind: EC2NodeClass
name: gpu
# Taints to prevent non-GPU workloads from scheduling
taints:
- key: nvidia.com/gpu
value: "true"
effect: NoSchedule
requirements:
- key: kubernetes.io/arch
operator: In
values: ["amd64"]
- key: kubernetes.io/os
operator: In
values: ["linux"]
# On-demand only for GPU (more reliable than Spot)
- key: karpenter.sh/capacity-type
operator: In
values: ["on-demand"]
# GPU instance types (g4dn family)
- key: node.kubernetes.io/instance-type
operator: In
values:
- g4dn.xlarge # 1 GPU, 4 vCPUs, 16 GB
- g4dn.2xlarge # 1 GPU, 8 vCPUs, 32 GB
- g4dn.4xlarge # 1 GPU, 16 vCPUs, 64 GB
- g4dn.8xlarge # 1 GPU, 32 vCPUs, 128 GB
- g4dn.12xlarge # 4 GPUs, 48 vCPUs, 192 GB
- key: topology.kubernetes.io/zone
operator: In
values:
- us-west-2a
- us-west-2b
- us-west-2c
# Longer expiration for GPU nodes (more expensive to churn)
expireAfter: 720h # 30 days
limits:
cpu: "200"
memory: 800Gi
disruption:
# Only consolidate when empty (preserve running GPU workloads)
consolidationPolicy: WhenEmpty
# Wait longer before consolidating GPU nodes
consolidateAfter: 300s # 5 minutes
budgets:
- nodes: "0" # No automatic disruption for GPU nodes
# Lower weight than general (lower priority)
weight: 5
|
Apply the NodePools:
1
2
3
4
5
| kubectl apply -f nodepool-general.yaml
kubectl apply -f nodepool-gpu.yaml
# Verify
kubectl get nodepools -n karpenter
|
Expected output:
1
2
3
| NAME READY AGE
general True 5s
gpu True 5s
|
Step 5: Testing and Validation
Now let’s test that Karpenter is working correctly.
5.1 Test General Workload Scaling
Create a test deployment:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
| # test-general-workload.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: karpenter-test-general
namespace: default
spec:
replicas: 5
selector:
matchLabels:
app: karpenter-test-general
template:
metadata:
labels:
app: karpenter-test-general
spec:
# Node selector to target general NodePool
nodeSelector:
workload-type: general
containers:
- name: nginx
image: nginx:latest
resources:
requests:
cpu: "1"
memory: "2Gi"
limits:
cpu: "1"
memory: "2Gi"
|
Deploy and watch Karpenter provision nodes:
1
2
3
4
5
6
7
| kubectl apply -f test-general-workload.yaml
# Watch nodes being created
kubectl get nodes -l karpenter.sh/nodepool -w
# Check Karpenter logs
kubectl logs -n karpenter -l app.kubernetes.io/name=karpenter -f
|
You should see nodes appear within 30-60 seconds.
5.2 Test GPU Workload Scaling
Create a GPU test deployment:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
| # test-gpu-workload.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: karpenter-test-gpu
namespace: default
spec:
replicas: 2
selector:
matchLabels:
app: karpenter-test-gpu
template:
metadata:
labels:
app: karpenter-test-gpu
spec:
nodeSelector:
workload-type: gpu
# Tolerate the GPU taint
tolerations:
- key: nvidia.com/gpu
operator: Equal
value: "true"
effect: NoSchedule
containers:
- name: cuda-test
image: nvidia/cuda:11.8.0-base-ubuntu22.04
command: ["sleep", "infinity"]
resources:
requests:
nvidia.com/gpu: "1"
cpu: "2"
memory: "8Gi"
limits:
nvidia.com/gpu: "1"
cpu: "2"
memory: "8Gi"
|
Deploy and verify:
1
2
| kubectl apply -f test-gpu-workload.yaml
kubectl get nodes -l workload-type=gpu -w
|
5.3 Test Consolidation
Delete the test workloads and watch Karpenter terminate unused nodes:
1
2
3
4
5
| kubectl delete -f test-general-workload.yaml
kubectl delete -f test-gpu-workload.yaml
# Watch nodes being terminated
kubectl get nodes -l karpenter.sh/nodepool -w
|
Nodes should be terminated within 30 seconds to 5 minutes depending on the consolidation policy.
5.4 Automated Validation Script
Create a comprehensive validation script:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
| #!/bin/bash
# karpenter-validation.sh
set -e
GREEN='\033[0;32m'
RED='\033[0;31m'
NC='\033[0m'
echo -e "${GREEN}[1/6] Checking Karpenter controller...${NC}"
kubectl get deployment -n karpenter karpenter
kubectl wait --for=condition=available --timeout=60s deployment/karpenter -n karpenter
echo -e "${GREEN}[2/6] Checking NodePools...${NC}"
kubectl get nodepools -n karpenter
echo -e "${GREEN}[3/6] Checking EC2NodeClasses...${NC}"
kubectl get ec2nodeclasses -n karpenter
echo -e "${GREEN}[4/6] Deploying test workload...${NC}"
kubectl apply -f test-general-workload.yaml
echo -e "${GREEN}[5/6] Waiting for nodes to be provisioned (max 5 min)...${NC}"
timeout 300 bash -c 'until [ $(kubectl get nodes -l karpenter.sh/nodepool --no-headers | wc -l) -gt 0 ]; do sleep 5; done'
NODES=$(kubectl get nodes -l karpenter.sh/nodepool --no-headers | wc -l)
echo -e "${GREEN}✓ Karpenter provisioned $NODES node(s)${NC}"
kubectl get nodes -l karpenter.sh/nodepool
echo -e "${GREEN}[6/6] Cleaning up...${NC}"
kubectl delete -f test-general-workload.yaml
echo -e "${GREEN}✓ Validation complete!${NC}"
|
Run the validation:
1
2
| chmod +x karpenter-validation.sh
./karpenter-validation.sh
|
Monitoring and Troubleshooting
Key Commands for Monitoring
1
2
3
4
5
6
7
8
9
10
11
12
13
14
| # View NodePool status
kubectl get nodepools -n karpenter
# View provisioned nodes
kubectl get nodes -l karpenter.sh/nodepool
# Check controller logs
kubectl logs -n karpenter -l app.kubernetes.io/name=karpenter -f
# View events
kubectl get events -n karpenter --sort-by='.lastTimestamp'
# View detailed NodePool information
kubectl describe nodepool general -n karpenter
|
Common Issues
1. Nodes Not Provisioning
Check Karpenter logs:
1
| kubectl logs -n karpenter -l app.kubernetes.io/name=karpenter --tail=100
|
Common causes:
- IAM permissions missing → Verify controller policy
- Subnet capacity exhausted → Check subnet available IPs
- Security groups misconfigured → Verify discovery tags
- Instance type unavailable → Check AWS service quotas
2. Karpenter Controller CrashLooping
1
2
| kubectl get sa karpenter -n karpenter -o yaml
kubectl describe pod -n karpenter -l app.kubernetes.io/name=karpenter
|
Common causes:
- IRSA misconfiguration → Verify IAM role ARN annotation
- OIDC provider not configured → Check EKS cluster OIDC
- Webhook certificate issues → Restart controller
3. Nodes Not Terminating
1
2
| kubectl get pdb -A
kubectl get nodepools -n karpenter -o yaml
|
Common causes:
- PodDisruptionBudgets too restrictive
- Consolidation policy too conservative
do-not-disrupt annotation on pods
Prometheus Metrics
Karpenter exposes metrics on port 8080:
1
2
| kubectl port-forward -n karpenter svc/karpenter 8080:8080
curl http://localhost:8080/metrics | grep karpenter
|
Key metrics to monitor:
karpenter_nodes_created_total: Total nodes createdkarpenter_nodes_terminated_total: Total nodes terminatedkarpenter_provisioner_scheduling_duration_seconds: Time to provision nodeskarpenter_cloudprovider_instance_type_cpu_cores: CPU capacity by instance type
Best Practices and Cost Optimization
1. Instance Type Diversity
Allow Karpenter to choose from a wide range of instance types for better Spot availability and cost optimization:
1
2
3
4
5
6
7
8
9
10
11
| requirements:
- key: node.kubernetes.io/instance-type
operator: In
values:
- t3a.large
- t3a.xlarge
- m6a.large
- m6a.xlarge
- m6i.large
- m6i.xlarge
# Add 10-15 instance types for best results
|
2. Use Spot Instances Strategically
- General workloads: 70-80% Spot, 20-30% On-Demand
- Stateful workloads: 100% On-Demand
- GPU workloads: 100% On-Demand (Spot GPU can be expensive)
3. Set Resource Limits
Prevent runaway costs by setting limits:
1
2
3
| limits:
cpu: "500"
memory: 2000Gi
|
4. Enable Consolidation
Aggressive consolidation saves money:
1
2
3
| disruption:
consolidationPolicy: WhenEmptyOrUnderutilized
consolidateAfter: 30s
|
5. Separate NodePools by Workload Characteristics
Create separate NodePools for:
- CPU-intensive workloads (compute-optimized instances)
- Memory-intensive workloads (memory-optimized instances)
- GPU workloads (GPU instances)
- Burstable workloads (T-family instances)
6. Use Mutually Exclusive NodePools
Ensure NodePools don’t overlap to avoid random selection:
1
2
3
4
5
6
7
8
9
10
11
| # NodePool 1: Only Spot
requirements:
- key: karpenter.sh/capacity-type
operator: In
values: ["spot"]
# NodePool 2: Only On-Demand
requirements:
- key: karpenter.sh/capacity-type
operator: In
values: ["on-demand"]
|
7. Pin AMI Versions in Production
For production, pin specific AMI versions to avoid unexpected updates:
1
2
| amiSelectorTerms:
- id: ami-0123456789abcdef0 # Specific AMI ID
|
8. Implement Spot Interruption Handling
Set up an SQS queue for graceful Spot interruption handling:
1
2
3
4
5
6
7
8
9
10
11
12
13
| # Create SQS queue (via Terraform)
resource "aws_sqs_queue" "karpenter_interruption" {
name = var.cluster_name
}
# Subscribe to EC2 Spot interruption events
resource "aws_cloudwatch_event_rule" "spot_interruption" {
name = "${var.cluster_name}-spot-interruption"
event_pattern = jsonencode({
source = ["aws.ec2"]
detail-type = ["EC2 Spot Instance Interruption Warning"]
})
}
|
Expected Cost Savings
Based on typical workload distributions:
| Workload Type | % of Cluster | Savings vs Managed Node Groups |
|---|
| General (Spot) | 70% | 60-70% |
| General (On-demand) | 10% | 20-30% |
| GPU (On-demand) | 20% | 20-30% |
| Overall | 100% | 40-60% |
For a cluster with $100,000 annual compute costs, expect savings of $40,000-$60,000 per year.
Conclusion
Karpenter represents a significant evolution in Kubernetes autoscaling, offering faster provisioning times, better cost optimization, and simpler configuration compared to the traditional Cluster Autoscaler.
Key takeaways:
- Setup: Use Terraform for IAM roles, Helm for Karpenter installation
- Configuration: Define NodePools for different workload types, use EC2NodeClasses for AWS-specific settings
- Cost Optimization: Leverage Spot instances, enable consolidation, use instance diversity
- Monitoring: Watch controller logs, track metrics, set up alerts
- Testing: Validate with test workloads before migrating production traffic
By following this guide, you should have a fully functional Karpenter setup that can provision nodes in under 60 seconds and reduce your EKS compute costs by 40-60%.
Sources