Introduction

AWS Spot Instances offer an excellent opportunity to reduce costs in your cloud computing infrastructure, but they come with a potential risk: they can be reclaimed by AWS with little warning. Gergely Orosz recently tweeted about the consequences of not being prepared for this, leading to high-profile outages. In response, I shared my own solution using a combination of backup node pools and taints to maintain high availability in this scenario.

In this blog post, we'll discuss how to create a resilient infrastructure by utilizing AWS Spot Instances strategically along with backup node pools and taints, ensuring cost efficiency and availability.

Section 1: How to Create a Backup Node Pool

Having a backup node pool not running on Spot Instances is critical for maintaining high availability. In this section, we'll look at setting up a backup node pool and configuring it to scale and accept migrating pods.

First, create a node pool:

kubectl create node-pool backup

Configure the node pool to auto-scale:

kubectl create cluster-autoscaler backup-node-pool

Set up the node pool to accept migrating pods, using labels and tolerations:

kubectl label nodes -l cloud.google.com/gke-nodepool=backup-node-pool backup=ready

Now, with the backup node pool set up, we can discuss using taints and affinities to manage pods on Spot Instances.

Section 2: Using Taints and Affinities for Pod Management on Spot Instances

Taints and affinities allow you to manage your Kubernetes deployments by directing pods to prefer running on Spot Instances. By doing so, you increase cost efficiency while maintaining stability.

Apply taints to the Spot Instance node pool:

kubectl taint nodes -l cloud.google.com/gke-nodepool=spot-node-pool spot=ready:NoSchedule

Configure your pods' affinities to prefer running on Spot Instances:

affinity:
  nodeAffinity:
    preferredDuringSchedulingIgnoredDuringExecution:
    - weight: 1
      preference:
        matchExpressions:
        - key: spot
          operator: In
          values:
          - ready

With taints and affinities in place, you can now run pods with a preference for Spot Instances without risking high availability.

Section 3: Handling Long-Running Jobs with Drain Scripts

Long-running jobs may be more vulnerable to Spot Instance reclaims, but you can use drain scripts to reassign these tasks to on-demand nodes when required.

Create a drain script to move workloads from Spot node pool to on-demand node pool:

#!/bin/bash
NODES=$(kubectl get nodes -l cloud.google.com/gke-nodepool=spot-node-pool -o jsonpath='{.items[*].metadata.name}')
for NODE in $NODES; do
    kubectl drain $NODE --ignore-daemonsets --delete-local-data --force
    kubectl uncordon $NODE
done

Execute the drain script when required, like when receiving spot termination notices.

By using this drain script, you can maintain high availability by quickly reassigning long-running jobs to on-demand nodes when Spot Instances face impending reclaims.

Conclusion

Managing AWS Spot Instances for cost savings requires a mix of strategies, including backup node pools, taints, and drain scripts. We've walked you through setting up a backup node pool, applying taints and affinities, and handling long-running jobs with drain scripts. By implementing these steps, you can maximize cost efficiency without compromising reliability in your AWS infrastructure. Give it a try – you won't be disappointed!

Stephen Lizcano

How to Maximize Cost Efficiency on AWS Spot Instances Without Sacrificing High Availability

by Stephen Lizcano

Share this article

Introduction

Section 1: How to Create a Backup Node Pool

Section 2: Using Taints and Affinities for Pod Management on Spot Instances

Section 3: Handling Long-Running Jobs with Drain Scripts

Conclusion

Stephen Lizcano

How to Maximize Cost Efficiency on AWS Spot Instances Without Sacrificing High Availability

Ready to dive in?

Get compliant and secure today!

Company

Contact us