vLLM and SGLang metrics

Jun 10, 2026

Go back

Baseten now surfaces engine-native metrics for models served with vLLM or SGLang directly in the Metrics tab.
Baseten automatically detects the engine through your container's /metrics endpoint, then graphs metrics such as tokens per second, time to first token, KV cache usage, and requests running or queued, no configuration or redeploy required.

vLLM metrics

You can also export these metrics to your own observability stack alongside Baseten's standard metrics.

For more information, see our docs.

Explore Baseten today

Start deploying Talk to an engineer