Nov 21, 2023

A checklist for switching to open source ML models

Prompt: Luggage on a trolley in a historic train station

Switching from a closed source ecosystem where you consume ML models from API endpoints to the world of open source ML models can seem intimidating. But this checklist will give you all of the resources you need to make the leap.

Checklist for switching to open source ML models

Pick an open source model

The biggest advantage of the open source ecosystem in ML is the sheer number and variety of models to choose from. But that amount of choice can be overwhelming. Here are some alternatives to closed-source models to get you started:

Large language models (LLMs):
- Closed source: GPT, Claude
- Open source: Mistral, Llama 2
Text embedding models:
- Closed source: ada-002
- Open source: jina-embeddings-v2
Speech to text (audio transcription) models:
- Closed source: Whisper from the Audio API
- Open source: Whisper on your own infra
Text to speech (audio generation) models:
- Closed source: Audio API text to speech endpoint
- Open source: Bark

Choose a GPU for model inference

Inference for most generative models like LLMs requires GPUs. Picking the right GPU is essential: you want the least expensive GPU powerful enough to run the model with acceptable performance.

For a 7 billion parameter LLM like Mistral 7B, you usually want an A10. A10s also give great performance for Whisper and Bark, but these smaller models can also fit on the less-expensive T4, though with longer generation times. And text embedding models don’t need a GPU at all, though a T4 can accelerate inference.

Here are some buyer’s guides to GPUs:

Find optimizations relevant to your use case

If you’re just experimenting with open source models or you need to get something in production yesterday, you can skip this step. But one of the most powerful things that switching to open source models unlocks is the ability to optimize a balance of latency, throughput, quality, and cost to align with your use case.

Get started with:

Deploy your model

Once you have your model and hardware configuration, it’s time to deploy. You can deploy a curated selection of models from our model library in just a couple of clicks or use Truss, our open source model packaging framework, to get any model up and running behind an API endpoint.

Dive into deployment with:

Open source models in the Baseten model library.
A quickstart guide for Truss, an open source model packaging framework.

Integrate your new model endpoint

Once you’ve deployed your model, you’ll need to use the model endpoint to integrate your model into your application.

Baseten has guides for:

Another great way to build with LLMs is to use a tool like LangChain as an abstraction on top of your model endpoint, which helps with switching between models, APIs, and providers.

If you want to dive deeper, check out our guide to open source alternatives for ML models. Wherever you are in your journey from evaluation to adoption for open source ML models, we’re here to help at support@baseten.co.

A checklist for switching to open source ML models

Pick an open source model

Choose a GPU for model inference

Find optimizations relevant to your use case

Deploy your model

Integrate your new model endpoint

Related ML models posts

The best open source large language model

Playground v2 vs Stable Diffusion XL 1.0 for text-to-image generation

Stable Video Diffusion now available