changelog / post
Monitor concurrent inference requests
Track the number of in-progress inference requests across your deployments, including both requests currently being serviced and those waiting in the queue. This is the key indicator used to drive autoscaling decisions, and is now visible in the metrics dashboard and available through metrics export. For more information, see the supported metrics docs and the autoscaling documentation.