Join us for a hands-on technical workshop and Brazilian churrasco experience at Fogo de Chão.
Discover how the world's largest AI inference workloads run at lightning speed on NVIDIA Dynamo, a distributed system for model serving.
In this 1-hour workshop, Harry Kim (NVIDIA) and Philip Kiely (Baseten) will dive deep into system-level optimizations that turbocharge LLM inference at scale, including:
KV-aware routing
KV cache offloading
PD disaggregation
After the session and Q&A, stay for a churrasco lunch. Enjoy eight different meats, a fresh salad bar, and traditional sides.
If you’re an AI engineer in SF, don’t miss this technical workshop and chance to network with peers. Lunch is on Nvidia and Baseten!
✅ Follow Baseten on Twitter & Linkedin
✅ Follow Nvidia on Twitter & Linkedin