Cache token pricing now available for Model APIs

Cached input tokens are billed at a discounted rate on Model APIs for all models (excluding GPT-OSS), starting April 17, 2026. Cache token pricing is applied automatically to the portion of each request that hits the KV cache. The number of cached tokens will be visible in the API response for every request.

We’re introducing cache token pricing to better fit the agentic workloads we serve on Model APIs. With that in mind, your bill should decrease in proportion to your cache hit rate. Cache token pricing is visible in-app and on our website. For those with high cache hit rates, you should expect substantial savings!

For more information, see our pricing table or read our docs.

Cache token pricing now available for Model APIs

Explore Baseten today