Table of Contents
Cloud technologies improve your infrastructure's scalability and flexibility. However, costs can easily compound without careful management. Also, the default cloud options, while convenient, are not necessarily the best or most cost-effective option for your workloads. Spot instances and vCluster can be cost hacks for your cloud Kubernetes environments, helping you do more with less.
Spot instances take advantage of spare compute capacity offered by cloud providers at discounted rates—up to 91 percent off standard prices. But this comes with the trade-off that it's susceptible to demand with minutes of warning before sudden termination. This unpredictability limits the production use cases for spot instances, relegating them to fault-tolerant and resilient tasks, such as batch processing or CI/CD pipelines.
In Kubernetes environments, orchestration and containerized applications can help to mitigate these instance interruptions. vCluster offers isolated virtual Kubernetes clusters that can be rapidly deployed and torn down within your environment. With vCluster, you can deploy flexible and resilient workloads while still benefiting from the cost savings of spot instances.
This article explains how to combine Spot instances and vCluster in your Kubernetes environment to build resilient and cost-effective architecture.
Why Combine Spot Instances with vCluster?
Cloud providers offer various approaches for reducing your costs, including discounted resources, long-term reservation plans, and alternative pricing models with certain trade-offs. Spot instances are one such option, providing access to significantly discounted "excess capacity" compute resources that would otherwise go unused. AWS Spot Instances, GCP Preemptible Virtual Machines, and Azure Spot Virtual Machines are popular examples of this.
Spot instances may be automatically terminated when demand rises, capacity becomes unavailable, or the market price exceeds your set bid limit. This inherent unpredictability requires careful planning if you plan to use spot instances; they are best used for batch jobs, CI/CD pipelines, and AI/ML workloads where interruptions can be tolerated.
vCluster helps optimize costs with its ability to create virtual Kubernetes clusters for multitenant, multiworkload, and multicluster scenarios. Compared to provisioning multiple physical clusters, vCluster reduces costs and operational overhead while providing better security and management than Kubernetes namespaces.


Creating virtual clusters on your spot instances decouples workloads from the underlying infrastructure. This abstraction ensures that workloads can easily migrate to other available resources in the parent cluster if the spot instances are interrupted.
The diagram shows how virtual clusters act as an abstraction layer between workloads and cluster resources. They can be configured to depend on specific node resources or migrate between node groups as needed.

Prerequisites
To complete this tutorial, you'll need access to a compatible cloud-based Kubernetes environment. Popular options include AWS(EKS), GCP(GKE), and Azure(AKS). Ensure your Kubernetes environment is compatible with vCluster; Kubernetes 1.30 is the recommended version. Additionally, you'll need appropriate permissions to install CLI tools and provision cloud instances within your environment.
Deploying vCluster on Spot Instances
Using your cloud-connected CLI, such as Cloudshell, run the following command to install vCluster:
curl -L -o vcluster "https://github.com/loft-sh/vcluster/releases/latest/download/vcluster-linux-amd64" && sudo install -c -m 0755 vcluster /usr/local/bin && rm -f vcluster
Verify the installation:
vcluster version
For different installation instructions, refer to the documentation.
Spin Up a Kubernetes Cluster
You can create a Cloud Kubernetes cluster for your environment with the user interface or a CLI tool like gcloud. Cloudshell has gcloud installed, which can be used to spin up and configure cluster resources from your CLI.
Let's create a GKE cluster using the following commands.
gcloud config set project <GCLOUD PROJECT NAME>
gcloud services enable container.googleapis.com
export CLUSTER_NAME=samplecluster
export REGION=us-east1-b
export MACHINE_TYPE=e2-standard-2
gcloud container clusters create $CLUSTER_NAME --region $REGION --cluster-version 1.30 \
--machine-type=$MACHINE_TYPE --num-nodes=2
This sets up a GKE cluster in the South Carolina region and attaches an on-demand node group with two E2-standard instances.
With the cluster set up, you can add more instances and node groups to it. Let's add some spot instances.
gcloud container node-pools create spot-group --cluster=$CLUSTER_NAME --region=$REGION \
--num-nodes=2 --min-nodes=2 --max-nodes=3 --machine-type=$MACHINE_TYPE --spot --labels=nodetype=spot
This command adds a node group with two labeled E2-standard spot instances to your cluster. It also sets autoscaling limits between two and three nodes, ensuring your workloads remain highly available without overcommitting resources.
You've now set up a cluster that includes two node groups and spot and on-demand instances.
You can check that the nodes have been successfully created with this `kubectl` command:
kubectl get nodes -o wide
When you use the CLI to create clusters and nodes, it automatically adds several labels to your instances, including details about the instance capacity type.
Use the following commands to view the instance labels:
kubectl get nodes --show-labels
OR
gcloud container node-pools describe spot-group --cluster=$CLUSTER_NAME --region=$REGION
You should see the `cloud.google.com/gke-provisioning=spot,cloud.google.com/gke-spot=true` labels that gcloud generated and your custom `nodetype=spot` label.

Labels help you select and group resources, enable pod-node affinity, and organize your cluster logically.
With that, you've now set up your Kubernetes environment with some spot instances. In the next sections, you'll learn how to configure and work with vCluster.
Working with vCluster
By default, vCluster uses pseudo nodes for its internal workloads. These pseudo nodes can be based on one or multiple "real" nodes in your Kubernetes cluster.
You can configure vCluster to sync all or parts of your cluster with the created virtual cluster. To set up this configuration, you can deploy your virtual clusters with config YAMLs:
sync:
fromHost:
nodes:
enabled: true
selector:
labels:
nodetype: spot
This configuration will set vCluster to only sync and use the labeled spot instances, restricting all vCluster activities to those particular nodes.
For cost savings, pseudo nodes are ideal since they map to one or more real nodes in the host cluster and remain available as long as at least one real node is running.
Let's deploy a virtual cluster in your Kubernetes environment using the following command:
vcluster create samplevc -n vc
This would create the virtual cluster and immediately connect your kubectl to it. Run the following command to check:
kubectl get nodes --show-labels
This shows that the pseudo node was created for your virtual cluster.

Note: A pending vCluster pod is likely related to your resource requests and limits. Check node logs with `kubectl describe node <node-name>` and your persistent volume `kubectl get pvc -n vc` to check if you have run afoul of cloud capacity limitations and quotas.
With your virtual cluster set up, let's test some workloads on it. Your virtual cluster still benefits from the workload scheduling options of your Kubernetes environment. You can use labels, affinity, taints, and tolerations to control pod scheduling and save nodes for specific workloads. With your current setup, you can deploy workloads and pods using your "real" nodes labels and taints.
For example, let's create a deployment to run a processing job on your spot instance with the following YAML file:
apiVersion: apps/v1
kind: Deployment
metadata:
name: load-generator
spec:
replicas: 1
selector:
matchLabels:
app: load-test
template:
metadata:
labels:
app: load-test
spec:
nodeSelector:
cloud.google.com/gke-provisioning: spot
containers:
- name: load-generator
image: busybox
command:
- /bin/sh
- -c
- "while true; do dd if=/dev/zero of=/dev/null bs=1M count=100; sleep 5; done"
Running this within your virtual cluster creates a pod on your pseudo node that utilizes resources from particular "real" nodes, specifically the labeled spot instances.

You now have a working virtual cluster where you can schedule workloads and data processes.
Scaling Your Workloads
Let's look at some scaling options to add resilience and fault-tolerance capabilities to your virtual cluster.
Enabling autoscaling in your virtual cluster allows workloads to adjust to demand by automatically spinning up new resources as needed. Autoscaling relies on metrics from your deployments and pods. While GKE includes a metrics server by default, vCluster requires either synchronizing with the host cluster's nodes or installing a metrics server within the virtual cluster. The best time to install the metrics server is during the virtual cluster's initialization.
First, delete your initial virtual cluster with the following command:
vcluster delete samplevc -n vc
Then create a YAML file and input the following configuration:
integrations:
metricsServer:
enabled: true
Using this configuration, create a new virtual cluster:
vcluster create samplevc -n vc -f <YAML_FILE>
With the metrics server enabled, you now have access to real-time metrics across your virtual cluster nodes and pods.
You can extend Kubernetes' Horizontal Pod Autoscaler and Veritcal Pod Autoscaler with your virtual cluster workloads.
For example, let's deploy, expose, and scale an Apache server within your virtual cluster. You can create your deployment's configuration file using these commands.
cat <<EOF > apache.yaml
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: php-apache
spec:
replicas: 1
selector:
matchLabels:
run: php-apache
template:
metadata:
labels:
run: php-apache
spec:
containers:
- name: php-apache
image: registry.k8s.io/hpa-example
ports:
- containerPort: 80
resources:
requests:
cpu: 200m
memory: 128Mi
limits:
cpu: 500m
memory: 256Mi
---
apiVersion: v1
kind: Service
metadata:
name: php-apache
spec:
ports:
- port: 80
selector:
run: php-apache
EOF
These commands define a Kubernetes application deployment for your Apache server and a service to expose it.
Deploy the application with the following command:
kubectl apply -f apache.yaml
With your deployment set up, you can immediately implement horizontal autoscaling using this kubectl command:
kubectl autoscale deployment php-apache --cpu-percent=50 --min=1 --max=10
This command sets up an autoscale that monitors the deployment's CPU utilization and spins up replicas once it goes past 50 percent.
Check that your horizontal pod autoscaler is running with this command.
kubectl get hpa
Let's stress test the deployment to see the autoscaling. Create separate CLIs and execute the following commands:
#first terminal - stress test
kubectl run -i --tty load-generator --rm --image=busybox --restart=Never -- /bin/sh -c "while sleep 0.01; do wget -q -O- http://php-apache; done"
# second terminal - scaling monitor
kubectl get hpa php-apache -w
As the application's CPU gets stressed past 50 percent utilization, the deployed autoscaler spins up more replica pods up to the limit of ten replicas. If you stop the traffic to the deployment, these replicas get removed as the CPU utilization falls.

Your nodes can also take advantage of autoscaling, where managed node groups and node pools adjust the number of nodes as needed to optimize resource usage automatically. This means that your current state of two spot instances will be maintained even if instances are terminated. If interruptions or terminations are discovered, new resources will be created until the desired number is reached—in this case, two spot and standard nodes.
You can also set up cluster autoscaling to ensure your node groups can respond to demand. New nodes will be created to meet the requirements of pending workloads, and nodes will be removed when resources are underutilized. Karpenter is a helpful open source tool for node autoscaling and lifestyle management that can be easily integrated into vCluster and eksctl.
Managing Spot Instance Interruptions
vCluster workloads have been built to be resistant to spot instance interruptions. An interrupted instance would trigger a new instance request while workloads continue using available resources. You can simulate interruptions, manually stopping instances or generating spikes in traffic, to observe application behavior within vCluster and the performance patterns under stress.
It's also important to distribute your workloads to effectively alleviate any effects a spot instance termination might cause. You can do this with pod anti-affinity rules, for example, setting virtual cluster pods to repel each other so that they cannot be scheduled on the same nodes. You can also use Kubernetes controllers, like ReplicaSets and StatefulSets, to guarantee multiple replicas of your application are always running on different nodes.
Increased demand for your required instance types or scheduling issues can occasionally result in no available spot instances. In these cases, you can rely on your range of node pools with on-demand or reserved instances to handle the necessary workloads. Setting preferredDuringSchedulingIgnoredDuringExecution
allows you to set failover processes where your preferred instances are prioritized but allowed to be skipped when no other option is available.
Cost Management and Optimization Techniques
Observability is very important in your maintenance and optimization process. Use monitoring tools, like Prometheus and AWS Cloudwatch, to track how your cluster performs, including metrics like pod rescheduling events, latency spikes during failovers, and resource utilization across nodes.
Having visibility into your cluster expenses and resource utilization is a foundational hack for your cost management. Cloud providers offer tools such as AWS Budgets, Google Cloud Billing, or Azure Cost Management to help you visualize and set spending thresholds.
These tools give you an overview of the cost of the resources you are using, and they commonly provide limits and alerting features. Create alerts for different levels, including critical and warning thresholds, then integrate them into email or Slack for real-time reporting.
GKE offers extensive observability tabs, allowing you to monitor and set alerts for multiple clusters, nodes, pods, or workloads directly from the Google console. It provides easily configurable templates and various notification channels for your organization. Resource quotas can also help prevent excessive or unplanned resource usage by setting hard limits that deployments cannot exceed.
Cloud providers offer a wide range of compute, storage, and networking options to support diverse use cases, from general-purpose workloads to applications requiring high availability. Keeping a balanced mix of instance types is necessary for optimal performance, cost-efficiency, and reliability. Use your spot instances for noncritical or fault-tolerant workloads, reserved instances for long-term cost savings in predictable workloads, and on-demand instances for mission-critical workloads with uptime requirements.
Important: You should destroy all resources created using gcloud to stop the cloud billing time. The following command will delete the created cluster:
gcloud container clusters delete $CLUSTER_NAME --region $REGION
Be sure to run through the Google Cloud interface to be sure all resources have been deleted.
Conclusion
Cloud Kubernetes environments provide a flexible and scalable foundation for various use cases, including microservices, CI/CD deployments, scalable web applications, and ML/AI workloads. However, cost remains a significant barrier for many organizations. Using spot instances with vCluster helps reduce expenses while maintaining efficient and optimized workloads.
To see the benefits firsthand, try deploying a test environment with vCluster and spot instances. This allows you to experiment with cost savings and workload management without affecting production. For even greater control over your Kubernetes workloads, consider using vCluster.
Have questions or suggestions?
We'd love for you to be part of our Slack Community! Come join the conversation and connect with us!