changelog / post
Configure scale-down rate
You can now cap how aggressively the autoscaler removes replicas when traffic drops. Set max_scale_down_rate between 1% and 50% (default 50%) to limit the share of excess replicas removed at each scale-down step.
Lower the rate to scale down more gradually and keep more replicas warm when traffic tends to rebound. Raise it toward 50% to release idle capacity faster.
curl -X PATCH \
https://api.baseten.co/v1/models/$MODEL_ID/deployments/production/autoscaling_settings \
-H "Authorization: Api-Key $BASETEN_API_KEY" \
-H "Content-Type: application/json" \
-d '{"max_scale_down_rate": 20}'For more information, see our docs.