Track both active and starting up replicas

The replica count chart on the model metrics page is now broken out into “active” and “starting up” replicas.

  • An active replica has loaded the model for inference and is actively responding to traffic.

  • A replica is starting up if it’s been created by the autoscaler to handle additional traffic, but isn’t yet ready to respond to requests.

Once a replica finishes starting up, it becomes active. When no longer needed, it will deactivate.

The replicas chart shows both active and starting up replicas