AWS Spot Instances offer an excellent opportunity to reduce costs in your cloud computing infrastructure, but they come with a potential risk: they can be reclaimed by AWS with little warning. Gergely Orosz recently tweeted about the consequences of not being prepared for this, leading to high-profile outages. In response, I shared my own solution using a combination of backup node pools and taints to maintain high availability in this scenario.
In this blog post, we'll discuss how to create a resilient infrastructure by utilizing AWS Spot Instances strategically along with backup node pools and taints, ensuring cost efficiency and availability.
Having a backup node pool not running on Spot Instances is critical for maintaining high availability. In this section, we'll look at setting up a backup node pool and configuring it to scale and accept migrating pods.
kubectl create node-pool backup
kubectl create cluster-autoscaler backup-node-pool
kubectl label nodes -l cloud.google.com/gke-nodepool=backup-node-pool backup=ready
Now, with the backup node pool set up, we can discuss using taints and affinities to manage pods on Spot Instances.
Taints and affinities allow you to manage your Kubernetes deployments by directing pods to prefer running on Spot Instances. By doing so, you increase cost efficiency while maintaining stability.
kubectl taint nodes -l cloud.google.com/gke-nodepool=spot-node-pool spot=ready:NoSchedule
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
preference:
matchExpressions:
- key: spot
operator: In
values:
- ready
With taints and affinities in place, you can now run pods with a preference for Spot Instances without risking high availability.
Long-running jobs may be more vulnerable to Spot Instance reclaims, but you can use drain scripts to reassign these tasks to on-demand nodes when required.
#!/bin/bash
NODES=$(kubectl get nodes -l cloud.google.com/gke-nodepool=spot-node-pool -o jsonpath='{.items[*].metadata.name}')
for NODE in $NODES; do
kubectl drain $NODE --ignore-daemonsets --delete-local-data --force
kubectl uncordon $NODE
done
By using this drain script, you can maintain high availability by quickly reassigning long-running jobs to on-demand nodes when Spot Instances face impending reclaims.
Managing AWS Spot Instances for cost savings requires a mix of strategies, including backup node pools, taints, and drain scripts. We've walked you through setting up a backup node pool, applying taints and affinities, and handling long-running jobs with drain scripts. By implementing these steps, you can maximize cost efficiency without compromising reliability in your AWS infrastructure. Give it a try β you won't be disappointed!
Share this article
Delivering the fastest path to security and compliance in the cloud.
Β© Copyright 2025 StarOps.
Proudly made in
Los Angeles, CA πΊπΈ