Product
Introducing Custom Servers: Deploy production-ready model servers from Docker images
Deploy production-ready model servers on Baseten directly from any Docker image using just a YAML file.
Create custom environments for deployments on Baseten
Test and deploy ML models reliably with production-ready custom environments, persistent endpoints, and seamless CI/CD.
Introducing canary deployments on Baseten
Our canary deployments feature lets you roll out new model deployments with minimal risk to your end-user experience.
Using asynchronous inference in production
Learn how async inference works, protects against common inference failures, is applied in common use cases, and more.
Baseten Chains explained: building multi-component AI workflows at scale
A Delightful Developer Experience for Building and Deploying Compound ML Inference Workflows
New in May 2024
AI events, multicluster model serving architecture, tokenizer efficiency, and forward-deployed engineering
New in April 2024
Use four new best in class LLMs, stream synthesized speech with XTTS, and deploy models with CI/CD
New in March 2024
Fast Mistral 7B, fractional H100 GPUs, FP8 quantization, and API endpoints for model management.
New in February 2024
3x throughput with H100 GPUs, 40% lower SDXL latency with TensorRT, and multimodal open source models.
New in January 2024
A library for open source models, general availability for L4 GPUs, and performance benchmarking for ML inference