Plans and pricing
Pay for what you use
Only pay for the time your model is actively deploying, scaling up or down, or making predictions. Further calibrate autoscaling settings to save even more on compute resources.
Compute costs
CPU-only
GPU
Choose the plan that's right for you
For starting up and scaling up
per month
For custom engagements and support at scale
Trusted by top data science and machine learning teams

Commonly asked questions
Baseten is the simplest way to put a model behind an API or webapp hosted on fully managed, scalable infrastructure.
You have control over what GPUs your models use. We currently offer NVIDIA T4, NVIDIA A10, and NVIDIA V100 GPUs. Contact us to learn more.
Our servers are located on the U.S. west coast in AWS data centers. More regions are being added to reduce global latency.
We bill for the time your model is active, by the minute. You have control over when each model is active, resource instance type, and autoscaling settings. When you first start using model resources, you’ll be asked to add a credit card to your account. At the end of each month, we’ll charge the card on file for your total usage throughout that month.
Yes. We offer on-premise deployments on our Enterprise plan. Contact us to learn more.
Data and workloads are hosted in AWS. All user workloads are run in isolated environments. We have isolation at hardware & network levels.
Yes, buy yearly to get 10% off. Contact us for yearly plans.
Yes, we are happy to support ML efforts for education and non-profit organizations. Contact us to learn more.