Announcing our Series F. Learn more
changelog / post

Configure scale-down rate

Go back

You can now cap how aggressively the autoscaler removes replicas when traffic drops. Set max_scale_down_rate between 1% and 50% (default 50%) to limit the share of excess replicas removed at each scale-down step.

Lower the rate to scale down more gradually and keep more replicas warm when traffic tends to rebound. Raise it toward 50% to release idle capacity faster.

curl -X PATCH \
    https://api.baseten.co/v1/models/$MODEL_ID/deployments/production/autoscaling_settings \
    -H "Authorization: Api-Key $BASETEN_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{"max_scale_down_rate": 20}'

For more information, see our docs.