Jun 8, 2023Revised Nov 8, 2023

Deploy open-source models in a couple clicks from Baseten’s model library

Prompt: An astronaut riding a llama on Mars

TL;DR

Baseten’s model library is an easy way to deploy your own instance of popular open-source models, whether you’re just starting to familiarize yourself with the model or need to support a production-level use-case. Here we’ll share more about how model deployment from our model library works. Deploy an open-source model today–new users get $30 of model resource credit to get started for free.

If you’re looking to further adapt a model, you can clone the model from GitHub. And you don’t have to start with a model in our model library–you can deploy any model on Baseten, either your own or an open-source model we haven’t added to our library yet, by packaging it with Truss (Baseten’s open-source model packaging framework).

In recent months, open-source model quality has improved dramatically, to the point that entire products are being built on top of open-source foundation models. Whereas open-source models used to be a good starting place for experimentation before training your own model, companies are now getting real value from out-of-the-box open-source models–assuming they can figure out how to deploy and serve these models at scale (which is where we come in!).

Just to give one example, at the end of 2021, we collaborated with Patreon’s machine learning team to build an audio transcription tool for content moderation on top of open-source wav2vec. While this initial transcription tool was a decent proof-of-concept, the output quality left a bit to be desired. Fast forward to 2023 and a similar transcription tool, now using the more advanced open-source audio transcription model Whisper, is being used to transcribe and create subtitles for thousands of hours of content each week.

We’re really excited about this progression in open source model quality and the products being built with these models as a result–and we know this is just the beginning. We built the Baseten model library so you can quickly and easily deploy popular open-source models, and then serve them performantly, scalably, and cost-effectively as your products take off.

Get your own open-source model instance

There are many other services that allow you to make API calls to shared instances of open-source models. This can be a great, inexpensive way to familiarize yourself with an open-source model. But as you start to build for production use-cases, you’re going to need your own instance of the model so that you can control and feel confident in the model’s performance.

When you deploy and serve an open-source model from our model library, you get your own model instance and the full power of Baseten’s model serving infrastructure behind that model. This includes full visibility and control over your model with:

‍Autoscaling (Horizontal scaling): Configure min and max replicas to handle more concurrent requests when demand is high and save infrastructure costs when demand is low.
‍Model logs: Deployment and prediction logs record activity on your model, including HTTP response codes and any errors or warnings during model invocation
‍Model metrics: Understand your model’s prediction volume, response time, CPU and memory usage, and available replicas
‍Model resource management (Vertical scaling): Model library models are assigned compatible resources by default, but you can choose to allocate additional memory, additional CPUs, or more powerful GPUs as needed.

A model metrics graph showing the end-to-end response times for a model

Pay-per-minute pricing

Baseten charges by the minute for the model resources you use when your open-source model is actively deploying, scaling up or down, or making predictions. For example, Whisper works well on an A10G instance with 24 GiB of video memory, 4 vCPUs, and 16 GiB of RAM.

Want to learn more about choosing the right GPUs for your model? Baseten technical writer Philip Kiely has a great series on our blog. Start here: Comparing GPUs across architectures and tiers

Baseten’s per-minute pricing is predictable because it’s all-inclusive of GPU, CPU, and memory.

See all our compute options and prices.

Scale to zero is enabled for all model library models, so you only pay when your models are actually being used. Cold starts are blazing fast, with Stable Diffusion starting up in six seconds and Whisper in nine.

Adapt the model

From the model library, you also have access to each model’s Truss package, which you can download and edit to further adapt the model.

Truss is an open-source model packaging library developed by Baseten. Using Truss makes model deployment an interactive, configurable, reliable process and also lets you store, share, and version control your model however you'd like.

For example, you could edit the model Truss to modify pre- and post-processing functions to format inputs and outputs according to your product’s needs. Or maybe you need to securely access secrets during model invocation. Or you could choose to further modify the model’s behavior with different tokenizers or schedules.

For each model library model, you get easy access to deploy the model and to view its Github repository and Truss package

Time to deploy!

You can visit the Baseten model library to browse and deploy open-source models. New users get $30 of model resource credit to get started deploying and serving models for free. And new models get added every week or two–we recently added Mistral 7B, Bark, and Stable Diffusion XL.

If you don’t see the open-source model you’re looking for, know that you can deploy any model on Baseten by packaging it with Truss, our open-source model packaging library. Or just reach out to support@baseten.co and we can help.

Deploy open-source models in a couple clicks from Baseten’s model library

TL;DR

Get your own open-source model instance

Pay-per-minute pricing

Adapt the model

Time to deploy!

Related Product posts

Using Asynchronous Inference in Production

Baseten Chains Explained: Building Multi-Component AI Workflows at Scale

New in May 2024