Automating Kubernetes Cleanup in CI Workflows

Hrittik Roy

July 8, 2025

4 Minute Read

This is some text inside of a div block.

Continuous Integration (CI) has resulted in a greater than 200% increase in deployment frequency and a 50% reduction in defect rates among major organizations worldwide [ HCL Report ].

These statistics are supported by how these modern practices are enhancing Mean Time to Resolution (MTTR), market responsiveness, and risk reduction while also improving the overall quality of software and the developer experience. The traditional approach for software testing environments was to configure general-purpose virtual machines for individual test runs. However, this created multiple challenges, including complicated workflows for resource management, debugging, and monitoring.

With CI becoming more common across more enterprises, and with Kubernetes becoming the preferred orchestrator for containerized workloads, enterprises are implementing new processes to automate CI pipelines.

Despite the advantages of using Kubernetes for automating CI pipelines, many organisations face challenges around cost optimization, resource overprovisioning, security and maintenance overhead. This post dives deeper into these challenges and how to solve them for a robust pipeline.

The Difficulties of Persistent Kubernetes Environments

A robust environment like Kubernetes offers a declarative interface, a programmable API, and multiple self-healing capabilities, which make it a “valuable capability in any discussion of CI/CD processes”.

Persistent Kubernetes environments have become the default choice for many organizations setting up their initial CI/CD pipelines due to their simplicity. While having a host cluster with pre-installed toolchains and configurations seems straightforward, this approach introduces significant challenges that can undermine the very benefits Kubernetes promises to deliver. Four key issues are detailed below:

1. Security Challenges in Persistent Environments

Long-running environments where access controls may not be regularly reviewed can lead to risks. Maintaining proper access controls and security policies across long-lived environments becomes increasingly complex over time. With different teams and varying requirements, maintaining these controls is a challenging task.

Moreover, all pipelines don’t require the same amount of privileged execution. With permissive access controls to all environments, there might be a case where a low level team has more than needed privileges and executes vulnerable images, jeopardising sister pipelines and the whole cluster environment.

The next problem is following the least privilege access philosophy and giving developers access to the clusters for debugging with proper SSO. Cluster access becomes complicated as more developers need close to admin privileges when debugging, and when shared infrastructure is concerned, it’s not optimal and can cause state changes for everyone.

2. Cost Implications of Persistent Environments

In most teams, resources remain allocated even during periods of inactivity to accommodate the planned pipeline, leading to significant waste of computational power, memory, and network bandwidth. The waste is not just overprovisioned and underutilised clusters, which leads to more compute costs, it also consists of ideal clusters during non-peak hours, significantly increasing control plane costs.

Depending on the pipeline, unattached volumes can exist that don’t serve any use cases after task completion. As your pipeline scales, all these costs add up, frustrating your FinOps team.

3. Maintenance Overheads of Persistent Environments

Keeping environments updated, patched, and properly configured requires constant attention from operations teams. With many teams, the challenge is bigger because team-induced or controller-induced configuration drifts break pipelines for different teams.

Monitoring all of these is challenging for operations teams and developers. There might also be hanging resources in the cluster that break other pipelines, so regular clean-up is essential bringing up the overall maintenance overhead.

4. Challenges with Custom Resource Definitions (CRDs)

Custom Resource Definitions (CRDs) extend Kubernetes beyond its core resource types, enabling administrators to introduce new, application-specific resources into clusters. As an organisation, you may have many external CRDs and many created internally.

Traditional namespace-based pipeline segregation fails because CRDs are cluster-level, making it impossible to test potential conflicts in different CRDs in separate pipelines. The only solution is to either create new clusters with operational overhead or delete the previous one, which might break operations for another team.

To solve these problems, ephemeral environments are emerging as a promising solution.

Ephemeral Environments as a Solution

Ephemeral means “short-lived,” so your environments follow the life cycle of your pipelines. In simpler terms, your cluster comes up when your pipeline is triggered, follows the steps in the pipeline, and then gets deleted. These environments represent a new frontier that teams embrace for improved and optimal workflow as they help reduce resource waste while improving security.

The simplest way to implement an ephemeral environment is to configure a new cluster for each pipeline, but considering the time it takes to provision a new cluster and install the required operators and configure connections, this process might be time-consuming. However, virtual clusters and auto clean up can solve these problems:

1. Virtual Clusters for Ephemeral Environments

Virtual Kubernetes clusters from solutions like vCluster, an open source project that enables you to create virtual clusters on a host cluster, solve the issues of creating physical clusters for every environment by providing a complete and isolated cluster for each of your pipelines, while spinning up in just a few seconds.

With vCluster, you get faster environment provisioning, cost savings, and automatic cleanup of your entire resource stack.

2. Auto Clean Up your CI

Each pipeline run creates numerous Kubernetes resources that can accumulate over time if not properly managed. The solution is that your pipeline can clean and delete underlying resources smoothly without manual intervention.

With vCluster, this is as easy as adding the auto-clean parameter, which will take care of everything. For example, if you use GitHub Actions and vCluster as your CI Infrastructure you can just use `auto-cleanup: true` to achieve that:

# .github/workflows/vclusters.yaml # Name of the workflow as it appears in GitHub Actions UI name: Pull Request Checks # Trigger this workflow on pull requests targeting the "main" branch on: pull_request: branches: - "main" jobs: deploy: # Use the latest Ubuntu runner for this job runs-on: ubuntu-latest steps: # Step 1: Install the vCluster CLI using the official GitHub Action - name: Install vCluster CLI uses: loft-sh/setup-vcluster@main # Step 2: Authenticate to the vCluster Platform using secrets stored in GitHub - name: Login to vCluster Platform instance env: LOFT_URL: ${{ secrets.LOFT_URL }} # The URL of your Loft/vCluster instance ACCESS_KEY: ${{ secrets.ACCESS_KEY }} # The access key for authentication run: vcluster login $LOFT_URL --access-key $ACCESS_KEY # Step 3: Create a new virtual cluster for this pull request - name: Create Virtual Cluster for PR uses: loft-sh/create-vcluster@main with: # Name the vCluster using the repository name and PR number for uniqueness name: pr-${{ github.event.repository.name }}-${{ github.event.pull_request.number }} # Automatically clean up the vCluster when the workflow finishes auto-cleanup: true

During the pipeline lifecycle, you will see cluster creation, task execution, and deletion of your cluster, saving your cost as your host cluster can use cluster auto scaler to scale the infrastructure down when no virtual clusters are there, as no pipelines are running:

3. Kubernetes Custom Resources

The other advantage is syncing between the host and your virtual cluster environment. That might sound complicated, but in simpler terms, you can synchronize your objects from your host cluster to the virtual cluster running your pipelines. So, you don’t need to wait for your platform layer installations like Istio, Secret Operators and others to go through and can simply import them from the host to save time.

For example, if you want to use ingress in your pipeline, you don’t need to create all these ingress controllers, resources, and classes. You can just sync them from the host alongside the following objects:

• CustomResources

• Nodes

• IngressClasses

• StorageClasses

• CSINodes

• CSIDrivers

• CSIStorageCapacities

• ConfigMaps

• Secrets

• Events

If you want, you can do the opposite and sync from your pipeline virtual cluster to the host using the toHost option. To enhance the awesomeness, you have integrations to configure Istio, cert-manager, and other tools to work seamlessly within both the host and virtual clusters.

Patches like the networking patch to specify which fields to edit, exclude, or override during syncing, giving you full control over the virtual clusters.

Lastly, vCluster enforces isolation at the CRD level, just like a real cluster, ensuring that any pipeline you’re running won’t interfere with others, despite being on the same host cluster. If you’re curious about the architecture, you can learn more about it here.

4. Improved DevEx

Improved Developer Experience (DevEx) is a priority for most organizations. With solutions like vCluster, developers benefit from rapid environment provisioning in just a few minutes, allowing them to focus on building and innovating rather than waiting for resources to become available.

Additionally, vCluster provides real cluster access, enabling teams to debug issues directly within their environments. Developers can investigate and resolve pipeline failures without needing special admin permissions or going through lengthy IT approval processes, making them more efficient and self-sufficient.

5. Happy Security Teams

Security teams are especially pleased with virtual clusters like vCluster because they allow organizations to integrate with their own Single Sign-On (SSO) systems, ensuring robust authentication and streamlined user management. This integration means teams can leverage their existing identity providers to control access, mapping users and groups directly to cluster-level privileges using Kubernetes RBAC, without exposing or risking the underlying host cluster.

Within a vCluster, users have admin-level access to manage resources, debug, and configure policies independently, but their permissions on the host cluster remain strictly minimized. This strong isolation ensures that even if a user has full control over their virtual cluster, they cannot compromise or alter the security or configuration of the host infrastructure. Security teams value this architecture because it enforces the principle of least privilege, reduces the risk of privileged access misuse, and maintains a clear separation between tenant and host environments-dramatically lowering the potential attack surface and safeguarding core infrastructure assets.

Final Thoughts

With this post, you might have understood how organizations are evolving to optimize their workflows and provide more business continuity and value. Adopting ephemeral environments is more than a trend-it’s a proven strategy for accelerating development, improving security, and reducing operational overhead. Take Ada, a 500+ head organization, for example: by switching to vCluster, their teams were able to spin up isolated Kubernetes environments on demand, making development and testing smoother and more efficient.

Developers no longer had to worry about resource conflicts or waiting for shared environments, and everything from CI integration to cleanup became much simpler. If you’re curious about how Ada made this work and the real benefits they saw, check out the full case study here.

More Questions? Join our Slack to talk to the team behind vCluster!

Automating Kubernetes Cleanup in CI Workflows

Table of Contents

The Difficulties of Persistent Kubernetes Environments

1. Security Challenges in Persistent Environments

2. Cost Implications of Persistent Environments

3. Maintenance Overheads of Persistent Environments

4. Challenges with Custom Resource Definitions (CRDs)

Ephemeral Environments as a Solution

1. Virtual Clusters for Ephemeral Environments

2. Auto Clean Up your CI

3. Kubernetes Custom Resources

4. Improved DevEx

5. Happy Security Teams

Final Thoughts

Related blog posts

What Is GPU Sharing in Kubernetes?

Smarter Infrastructure for AI: Why Multi-Tenancy is a Climate Imperative

Bare Metal Kubernetes with GPU: Challenges and Multi-Tenancy Solutions

Table of Contents

The Difficulties of Persistent Kubernetes Environments

1. Security Challenges in Persistent Environments

2. Cost Implications of Persistent Environments

3. Maintenance Overheads of Persistent Environments

4. Challenges with Custom Resource Definitions (CRDs)

Ephemeral Environments as a Solution

1. Virtual Clusters for Ephemeral Environments

2. Auto Clean Up your CI

3. Kubernetes Custom Resources

4. Improved DevEx

5. Happy Security Teams

Final Thoughts

Related blog posts

What Is GPU Sharing in Kubernetes?

Smarter Infrastructure for AI: Why Multi-Tenancy is a Climate Imperative

Bare Metal Kubernetes with GPU: Challenges and Multi-Tenancy Solutions

Sign up for our newsletter