Conferences

Kubernetes AI + HPC Day NA 2023

Nov 8
-
Nov 8
,
2023
KubeCon + CloudNativeCon | North America 2023

Running and managing a large number of Kubernetes clusters on bare metal poses significant challenges, from security to GPU provisioning to scalability. Specialized cloud provider CoreWeave experienced these first-hand, operating 3,000+ Kubernetes clusters on top of 5,000 bare metal nodes with massive amounts of GPUs to power modern AI applications at scale. In the session, we’ll dive into these challenges and how CoreWeave partnered with Loft Labs, the maintainers of vcluster, to create this serverless Kubernetes experience for numerous companies running AI workloads at scale. This session demonstrates the pitfalls, design choices and architectural challenges the teams have dealt with over the course of 3 years while evolving its serverless Kubernetes offering, including:

- Secure Isolation Of Tenants On A Shared Infrastructure
- Challenges in achieving 10 second autoscaling
- On-Demand Cluster & Compute Provisioning For Tenants -Day 2 Operations & Managing A Fleet Of Clusters At Scale

Speakers

Lukas Gentele
Co-Founder & CEO at Loft Labs

Transcript

The future of Kubernetes starts with vCluster

Better isolation for every workload, from CPU to GPU—no VMs required.

Try vCluster