If ML progress in 2022 makes you feel like the year was a decade, you’re not alone. Believe it or not, GitHub Copilot was released just 14 months ago, Stable Diffusion 5 months ago, Whisper 3 months ago, and ChatGPT just last month. We can’t wait to see what 2023 brings in the world of ML — we’re betting that foundational models will empower even more data scientists, ML practitioners, and developers to build ML-powered applications.
Hosting Riffusion: Stable Diffusion fine-tuned to generate music
Riffusion, a generative model for creating music, is a viral project by Seth Forsgren and Hayk Martiros hosted on Baseten. How viral? As the twelfth-most-upvoted post of the year on Hacker News, Riffusion’s backend processed a little over 4 million song requests, peaking around 34 requests per second.
Achieving this feat of scale came down to three primary tactics:
- Load balancing across 50 Nvidia A10G GPUs
- Caching: checking queries against a database and returning stored responses to reduce model invocations
- Cost-effective scaling using a mixture of spot and on-demand instances
Read the full story on Twitter or Baseten’s Blog.
Manage your model’s resources
By default, models deployed on Baseten run on a single instance with 1 vCPU and 2 GiB of RAM. This instance size is sufficient for some models and workloads, but demanding models and high-traffic applications need more resources to operate. Workspaces on any paid plan can now upgrade their own model resources and configure autoscaling.
The default 1x2 instance type is free, but higher resource configurations are subject to usage-based pricing, billed monthly. When you configure model resources, you can see the hourly rate for the selected instance types (instances charge by the minute) along with an estimated monthly spend based on replica count.
If you're in a paid workspace and want to update your model resources, contact us
Live reload with draft models
A live reload workflow — redeploying and testing code in real time — is exactly the same superpower that web developers have enjoyed for decades. When you make changes to a draft model deployed on Baseten, the model server checks if it can hot-swap the new code in place of the code that is currently running. For example, if you update your pre-processing function to parse input differently, that new function can be swapped in and run immediately, without shutting down and rebuilding your model serving environment.
Draft models allow you to edit and re-deploy models without rebuilding the containers they run in. This makes the testing process from local to prod super fast. By saving time on rebuilds, draft models accelerate common deployment tasks by 100x.
Get all the details on our blog.
Data + Curiosity
Data + Curiosity is a weekly series where our very own Jesse Mostipak interviews data science practitioners on their craft. Subscribe on YouTube to catch new episodes every Tuesday morning!
A 2023 sneak peek: Blueprint!
Blueprint, by Baseten, is a new platform for building with generative AI. Join the waitlist for early access and updates. For a quick look at what’s coming, check out our blog post on the new web IDE.
Thank you to everyone for all the momentum we built in 2022, and looking forward to the growing energy in the ML space for 2023!
— The team at Baseten
Fine-tune FLAN-T5 on Blueprint today!
You can now fine-tune FLAN-T5, an instruction-tuned text-to-text transformer model developed by Google, on Blueprint!