Machine learning infrastructure that just works
Baseten provides all the infrastructure you need to deploy and serve ML models performantly, scalably, and cost-efficiently.
Get started in minutes. Avoid getting tangled in complex deployment processes. Deploy best-in-class open-source models and take advantage of optimized serving for your own models.Learn more about model deployment
truss init -- example stable-diffusion-2-1-base ./my-sd-truss
Serializing Stable Diffusion 2.1 truss.
Making contact with Baseten 👋 👽
🚀 Uploading model to Baseten 🚀
Upload progress: 0% | | 0.00G/2.39G
Open-source model packaging. Meet Truss, a seamless bridge from model development to model delivery. Truss presents an open-source standard for packaging models built in any framework for sharing and deployment in any environment, local or production.
1from transformers import GenerationConfig, LlamaForCausalLM, LlamaTokenizer, TextIteratorStreamer 2from threading import Thread 3import torch 4 5class Model: 6 def __init__(self, **kwargs): 7 self._secrets = kwargs["secrets"] 8 self._model = None 9 self._tokenizer = None 10 11 def load(self): 12 self._model = LlamaForCausalLM.from_pretrained( 13 "meta-llama/Llama-2-70b-chat-hf", 14 use_auth_token=self._secrets["hf_access_token"], 15 device_map="auto", 16 torch_dtype=torch.float16, 17 ) 18 self._tokenizer = LlamaTokenizer.from_pretrained( 19 "meta-llama/Llama-2-70b-chat-hf", 20 use_auth_token=self._secrets["hf_access_token"], 21 torch_dtype=torch.float16, 22 ) 23 24 def predict(self, model_input): 25 prompt = model_input.pop("prompt") 26 stream = model_input.pop("stream", False) 27 return self.forward(prompt, stream, **request)
Highly performant infra that scales with you. We've built Baseten as a horizontally scalable service that takes you from prototype to production. As your traffic increases, our infrastructure automatically scales to keep up with it; there's no extra configuration required.Learn more about autoscaling
Faster and better
We've optimized every step of the pipeline — building images, starting containers and caching models, provisioning resources, and fetching weights — to ensure models scale-up from zero to ready for inference as quickly as possible.
Logs and health metrics. We've built Baseten to serve production-grade traffic to real users. We provide reliable logging and monitoring for every model deployed to ensure there's visibility into what's happening under the hood at every step.Learn more about logs and metrics
Resource management. Customize the infrastructure running your model. We provide access to latest and greatest infrastructure to run your models on. It's easy to configure and pricing is transparent.Learn more about customization
Select an instance type
Deploy on your own infrastructure or use AWS or GCP credits
Use your AWS or GCP credits on Baseten
Are you a startup with AWS and GCP credits? Use them on Baseten.
Deploy on your own infrastructure
Are you an enterprise that wants to self-host Baseten, or utilize compute across multiple clouds? Baseten is easily deployable inside any modern cloud. Your models and data don't need to leave your VPC.
Patreon saves nearly $600k/year in ML resources with Baseten
With Baseten, Patreon deployed and scaled the open-source foundation model Whisper at record speed without hiring an in-house ML infra team.
Laurel ships ML models 9+ months faster using Baseten
To automatically categorize hundreds of thousands of time entries every day, Laurel leverages sophisticated ML models and Baseten’s product suite.