Horizontal scaling via replicas with load balancing is an important technique for handling high traffic to an ML model.
This post simplifies instance sizing with heuristics to choose an optimal size for your model, balancing performance and compute cost.
2022's rapid ML advancements felt like a decade. Excited for 2023, we anticipate foundational models will further empower scientists and developers in ML apps.
Riffusion is a fine-tuned version of Stable Diffusion. Baseten served Riffusion over four million times in a couple of days, serving top-of-hacker-news traffic.
Baseten's development deployments speed up ML model dev loops, replacing slow workflows with a live reload system for quick, seconds-long testing updates.
October was a big month for the ML industry, with more momentum than ever behind spooky-good models and novel applications
Deploy OpenAI Whisper for free on Baseten instantly from our model library. Or stick around to learn how to package and deploy Whisper with Truss.
The state of cutting-edge open-source ML models, a more flexible interface for invoking models, and robust application development workflows