Deploying and managing ML models is smoother than ever with an all-new model management experience. Plus, a deep dive into text embeddings models and a cool side project by Baseten engineer Varun Shenoy: check out misgif.app — it’s the most fun you’ll have with AI all week!
This week, we overhauled the model management experience on Baseten, improving several core workflows to clarify the model lifecycle. These changes aren’t breaking — they’ll just make it easier to deploy and serve ML models performantly, scalably, and cost-effectively.
We also shipped:
Workspace API keys with more granular permissions
Separate measurements for end-to-end response time and inference time
In October, Jina AI launched a new text embedding model that matches OpenAI’s ada-002 in both context window size and benchmark performance. You can learn more about the model on our blog or deploy it for yourself from the model library.
Text embedding models don’t get the same headlines as LLMs or new editions of Stable Diffusion, but they’re an essential tool for building real-world applications with AI. Text embedding models encode the semantic meaning of a chunk of text by converting it into a fixed-length vector of floating-point numbers. Then, these vectors can be compared for search, recommendations, and classification. Text embedding models also unlock retrieval-augmented generation for LLMs, letting you build accurate models on top of datasets without fine-tuning the underlying model.
To get started building with text embedding models, read our new introduction to open source text embeddings.
Along with the new model management experience and other product features, we also shipped all-new docs in October at docs.baseten.co.
Visit the docs for guides to:
Plus, there are references for instance types and API endpoints. And for all of your model packaging and deployment needs, don’t forget about the Truss docs, which we improved with new example models backed by a nightly CI job.
Baseten CEO Tuhin Srivastava joined hosts Chris Benson and Daniel Whitenack on the Practical AI podcast for a conversation about self-hosting and scaling models. Give the episode a listen for thoughts on the future of ML infrastructure.
After an amazing evening in NYC during #TechWeek (big thanks to all of our panelists and everyone who came to the event), we’re hosting a fireside panel on the state of open source ML in San Francisco mid-month. We’ll also be at AWS re:Invent in NVIDIA’s generative AI pavilion at the end of the month. We hope to see you there!
We’ll be back next month with more from the world of open-source AI and ML!
Thanks for reading!
— The team at Baseten