Announcing our Series F. Learn more

Pricing built for growth

Production inference that won't break your product or your bank.

Start building Talk to an engineer

Basic

Deploy custom, fine-tuned, and open-source models

Included in Basic:

Dedicated deployments

Model APIs

Training

Fast cold starts

SOC 2 Type II and HIPAA compliant

Email and in-app chat support

Deployment options

$0 per month, pay as you go

Pro

Unlimited autoscaling and priority compute access

Everything in Basic plus:

Priority access to high-demand GPUs

Dedicated compute

Higher Model API rate limits

Hands-on engineering expertise

Dedicated support on Slack and Zoom

Deployment options

Volume discounts available

Enterprise

Full control in your cloud and ours

Everything in Pro plus:

Custom SLAs

Self-host deployments

On-demand flex compute

Use existing cloud commitments

Full control over data residency

Advanced security and compliance

Custom global regions

Advanced RBAC with Teams

Deployment options

Baseten Your VPC Hybrid

Volume discounts available

Pricing

Best-in-class model performance, effortless autoscaling, and blazing fast cold starts mean you get the most out of each GPU, saving money along the way.

Model APIs

Instant access to pre-optimized models running on the Baseten Inference Stack.

Price per

1M tokens

Model

Input

Cache Input

Output

GLM 5.2

GLM 5.2

$1.40

$1.40

$0.26

$0.26

$4.40Try Model API

$4.40Try

GLM 5.1

GLM 5.1

$1.30

$1.30

$0.26

$0.26

$4.30Try Model API

$4.30Try

GLM 5

GLM 5

$0.95

$0.95

$0.20

$0.20

$3.15Try Model API

$3.15Try

GLM 4.7

GLM 4.7

$0.60

$0.60

$0.12

$0.12

$2.20Try Model API

$2.20Try

Kimi K2.7 Code

Kimi K2.7 Code

$0.95

$0.95

$0.16

$0.16

$4.00Try Model API

$4.00Try

Kimi K2.6

Kimi K2.6

$0.95

$0.95

$0.16

$0.16

$4.00Try Model API

$4.00Try

Kimi K2.5

Kimi K2.5

$0.60

$0.60

$0.12

$0.12

$3.00Try Model API

$3.00Try

NVIDIA Nemotron 3 Ultra

NVIDIA Nemotron 3 Ultra

$0.60

$0.60

$0.12

$0.12

$2.40Try Model API

$2.40Try

NVIDIA Nemotron 3 Super

NVIDIA Nemotron 3 Super

$0.30

$0.30

$0.06

$0.06

$0.75Try Model API

$0.75Try

DeepSeek V4

DeepSeek V4

$1.74

$1.74

$0.145

$0.145

$3.48Try Model API

$3.48Try

GPT OSS 120B

GPT OSS 120B

$0.10

$0.10

-

-

$0.50Try Model API

$0.50Try

Dedicated Deployments

Only pay for the compute you use, down to the minute.

Volume discounts available

Price per

GPU Instances

Price

T4

16 GiB VM

$0.01052

L4

24 GiB VRAM

$0.01414

A10G

24 GiB VM

$0.02012

A100

80 GiB VRAM

$0.06667

H100 MIG

40 GiB VRAM

$0.0625

H100

80 GiB VRAM

$0.10833

B200

180 GiB VRAM

$0.16633

CPU Instances

Price

1x2

1 vCPU, 2 GiB RAM

$0.00058

1x4

1 vCPU, 4 GiB RAM

$0.00086

2x8

2 vCPUs, 8 GiB RAM

$0.00173

4x16

4 vCPUs, 16 GiB RAM

$0.00346

8x32

8 vCPUs, 32 GiB RAM

$0.00691

16x64

16 vCPUs, 64 GiB RAM

$0.01382

Talk to Sales about compute in other countries and regions.

Training

On-demand compute, devex, and infrastructure for your training jobs.

Volume discounts available

Price per

GPU Instances

Price

T4

16 GiB VM

$0.01052

L4

24 GiB VRAM

$0.01414

A10G

24 GiB VM

$0.02012

A100

80 GiB VRAM

$0.06667

H100 MIG

40 GiB VRAM

$0.0625

H100

80 GiB VRAM

$0.10833

B200

180 GiB VRAM

$0.16633

Talk to Sales about compute in other countries and regions.

Common questions

You can deploy open source and custom models on Baseten. Start with an off-the-shelf model from our model library. Or deploy any model using Truss, our open source standard for packaging and serving models built in any framework.

You have control over what GPUs your models use. See our instance type reference for a full list of the GPUs currently available on Baseten. Reach out to us to request additional GPU types.

Yes, new Baseten accounts come with credits so you can get to know the UI and experiment with deployments for free.

Yes, Baseten is SOC 2 Type II certified and HIPAA compliant. You can read more about our SOC 2 Type II certification here. And you can read more about our HIPAA compliance here.

No, you do not pay for idle time – you only pay for the time your model is using compute on Baseten. This includes the time your model is actively deploying, scaling up or down, or making predictions. And you have full control over how your model scales up or down.

Customer support levels vary by plan. We offer email, in-app chat, Slack, and Zoom support. We also offer dedicated forward-deployed engineering support. Reach out to our team to figure out a customer support level that works for your needs.

Yes, discounts on compute can be negotiated as part of our Pro and Enterprise plans. Reach out to our team to learn more.

Yes, you can self-host Baseten in order to manage security and use your own cloud commitments. Talk to our engineers to learn more.

Explore Baseten today

Start deploying Talk to an engineer