New in October 2023

A glowing cyberpunk car racing through an enchanted forest

TL;DR

Deploying and managing ML models is smoother than ever with an all-new model management experience. Plus, a deep dive into text embeddings models and a cool side project by Baseten engineer Varun Shenoy: check out misgif.app — it’s the most fun you’ll have with AI all week!

Introducing the all-new model management experience

This week, we overhauled the model management experience on Baseten, improving several core workflows to clarify the model lifecycle. These changes aren’t breaking — they’ll just make it easier to deploy and serve ML models performantly, scalably, and cost-effectively.

A development deployment in the process of being promoted to production

We also shipped:

Workspace API keys with more granular permissions
Separate measurements for end-to-end response time and inference time
Case-insensitive model IDs to enable the new endpoints in the model management refresh

jina-embeddings-v2-base-en: an open source text embedding model to match OpenAI’s ada-002

In October, Jina AI launched a new text embedding model that matches OpenAI’s ada-002 in both context window size and benchmark performance. You can learn more about the model on our blog or deploy it for yourself from the model library.

Text embedding models don’t get the same headlines as LLMs or new editions of Stable Diffusion, but they’re an essential tool for building real-world applications with AI. Text embedding models encode the semantic meaning of a chunk of text by converting it into a fixed-length vector of floating-point numbers. Then, these vectors can be compared for search, recommendations, and classification. Text embedding models also unlock retrieval-augmented generation for LLMs, letting you build accurate models on top of datasets without fine-tuning the underlying model.

A simplified model of a text embedding shows sentences grouped by semantic meaning

To get started building with text embedding models, read our new introduction to open source text embeddings.

Refreshed Baseten documentation

Along with the new model management experience and other product features, we also shipped all-new docs in October at docs.baseten.co.

Visit the docs for guides to:

Model lifecycle and model resources.
Model inference, including working with streaming, base64 output, and file-based I/O.
Optimizing autoscaling, cold starts, and concurrency.
Monitoring model metrics, model health, and security.

Plus, there are references for instance types and API endpoints. And for all of your model packaging and deployment needs, don’t forget about the Truss docs, which we improved with new example models backed by a nightly CI job.

ML community notes

Baseten CEO Tuhin Srivastava joined hosts Chris Benson and Daniel Whitenack on the Practical AI podcast for a conversation about self-hosting and scaling models. Give the episode a listen for thoughts on the future of ML infrastructure.

ML leaders from Ramp, Normal Computing, Patronus AI, and Goldman Sachs discuss the future of open source models

After an amazing evening in NYC during #TechWeek (big thanks to all of our panelists and everyone who came to the event), we’re hosting a fireside panel on the state of open source ML in San Francisco mid-month. We’ll also be at AWS re:Invent in NVIDIA’s generative AI pavilion at the end of the month. We hope to see you there!

We’ll be back next month with more from the world of open-source AI and ML!

Thanks for reading!

— The team at Baseten

New in October 2023

TL;DR

Introducing the all-new model management experience

jina-embeddings-v2-base-en: an open source text embedding model to match OpenAI’s ada-002

Refreshed Baseten documentation

ML community notes

Related Product posts

Using asynchronous inference in production

Baseten Chains explained: building multi-component AI workflows at scale

New in May 2024