Plans and pricing
Pay for what you use
Only pay for the time your model is actively deploying, scaling up or down, or making predictions. Further calibrate autoscaling settings to save even more on compute resources.
Compute costs
CPU only
GPU
Pay-per-minute pricing
Calculate your monthly usage based on your model's anticipated uptime, autoscaling, and whether or not you plan to add a GPU.
Choose the plan that's right for you
For starting up and scaling up
per month
For custom engagements and support at scale
Trusted by top engineering and machine learning teams


Commonly asked questions
Baseten is the simplest way to put a model behind an API or webapp hosted on fully managed, scalable infrastructure.
You have control over what GPUs your models use. We currently offer NVIDIA T4, A10, V100, and A100 GPUs available. Contact us to learn more or to request additional GPU types.
Our servers are located on the U.S. west coast in AWS data centers. More regions are being added to reduce global latency.
We bill for the time your model is active, by the minute. You have control over when each model is active, resource instance type, and autoscaling settings. After you use up your free credits, you’ll be asked to add a credit card to your account. At the end of each month, we’ll charge the card on file for your total usage throughout that month.
Yes. We offer on-premise deployments on our Enterprise plan. Contact us to learn more.
Data and workloads are hosted in AWS. All user workloads are run in isolated environments. We have isolation at hardware & network levels.
Yes, we offer significant volume discounts on model resources. Contact our sales team (link above) or reach out to us at hi@baseten.co to find out more.
Yes, we are happy to support ML efforts for education and non-profit organizations. Contact us to learn more.