Monitor concurrent inference requests

Feb 25, 2026

Go back

Track the number of in-progress inference requests across your deployments, including both requests currently being serviced and those waiting in the queue. This is the key indicator used to drive autoscaling decisions, and is now visible in the metrics dashboard and available through metrics export. For more information, see the supported metrics docs and the autoscaling documentation.

Concurrent Requests Graph

Explore Baseten today

Start deploying Talk to an engineer