Sep 29, 2023

New in September 2023

TL;DR

There’s a brand new open source LLM in town — more evidence for why open source AI will win. Plus, a breakdown of the A10 and A100 GPUs for common inference tasks, new model observability features on Baseten, and a new series of in-person events in NYC and SF exploring open source AI.

Mistral 7B: a new best-in-class LLM

Mistral 7B is a new foundation model by Mistral AI. It’s not just the best seven billion parameter open-source LLM to date, it also surpasses Llama 2 13B on most benchmarks and has strong code-generation capabilities.

truss predict -d '{"prompt": "What is the Mistral wind?"}'

The Mistral is a powerful, cold wind that blows from the northeast
through the French Alps and across southern Europe. It is known for
its strong, steady gusts that can reach up to 100 mph and can persist
for days or even weeks. The Mistral is a significant weather phenomenon
in this region, and it is often associated with clear skies and rapid
changes in temperature. It is also considered an important factor in
regional climate and ecology.

There’s a lot to be excited about with Mistral 7B:

It outperforms recent 13 billion parameter models like Llama 2 on popular benchmarks, despite having only 7 billion parameters (so you can run it on an A10 GPU).
Mistral released their models with a truly open source Apache 2.0 license.
It was released with both a chat-tuned instruct variant and a base variant, so expect to see new fine-tuned models that build on Mistral 7B’s performance.
Its sliding 4k-token context window is an interesting new approach to attention during inference.

Mistral dives into more technical details in the model announcement.

Get started with Mistral 7B behind an API endpoint in minutes from our model library.

A10 vs A100 GPUs for ML inference

The A10 and A100 GPUs are two of the most popular GPUs for model invocation on Baseten. The A100 is considerably more powerful than the A10 and has 80GB of VRAM versus the A10’s 24GB, which lets you run larger models with the A100.

Inference time for 50 steps of Stable Diffusion: 1.77 seconds on the A10, 0.89 seconds on the A100

Stable Diffusion inference: A10 vs A100

But the A100 is also five times more expensive than the A10 to use. So what workloads justify the additional cost, and what workloads can be run on the A10? For a full comparison of the specs and use cases for these two popular graphics cards, take a look at our A10 vs A100 breakdown.

Model observability features on Baseten

Inference volume showing 200, 400, and 500 response codes

In September, we invested in model observability to give you more control over model performance and behavior. Some new features:

Models can now return 500 response codes on error. Errors are shown in the model logs and aggregated in the model metrics page.
A new wake from sleep endpoint and button lets you start up a scaled-to-zero model ahead of anticipated inference traffic to save users from cold start times.
The new replica chart on the model metrics page now breaks out “starting up” and “active” replicas.

The replicas chart shows both active and starting up replicas

Open source live: an event series on the future of ML

Join us in New York and San Francisco in the upcoming weeks and months for a series of panels, meetups, and hackathons designed to bring together builders in the open source space.

Open source chatbot leadership summit poster

The series kicks off October 17th at New York #Techweek with an open source chatbot leadership summit hosted by Baseten CEO Tuhin Srivastava. For a full list of events and registration options, see our event series website at opensourcelive.ai.

We’ll be back next month with more from the world of open-source AI and ML!

Thanks for reading!

— The team at Baseten

New in September 2023

TL;DR

Mistral 7B: a new best-in-class LLM

A10 vs A100 GPUs for ML inference

Model observability features on Baseten

Open source live: an event series on the future of ML

Related Product posts

Using Asynchronous Inference in Production

Baseten Chains Explained: Building Multi-Component AI Workflows at Scale

New in May 2024