changelog / post
vLLM and SGLang metrics
Baseten now surfaces engine-native metrics for models served with vLLM or SGLang directly in the Metrics tab.
Baseten automatically detects the engine through your container's /metrics endpoint, then graphs metrics such as tokens per second, time to first token, KV cache usage, and requests running or queued, no configuration or redeploy required.
✕
vLLM metricsYou can also export these metrics to your own observability stack alongside Baseten's standard metrics.
For more information, see our docs.