Introducing Baseten Loops: A Training SDK for Frontier RL. Learn more here
changelog / post

vLLM and SGLang metrics

Go back

Baseten now surfaces engine-native metrics for models served with vLLM or SGLang directly in the Metrics tab.
Baseten automatically detects the engine through your container's /metrics endpoint, then graphs metrics such as tokens per second, time to first token, KV cache usage, and requests running or queued, no configuration or redeploy required.

vLLM metricsvLLM metrics

You can also export these metrics to your own observability stack alongside Baseten's standard metrics.

For more information, see our docs.