Permit inference on unhealthy models

A model enters an “unhealthy” state when the deployment is active but there are runtime errors such as downtime on an external dependency.

We now permit inference requests to proceed even when a model is marked as unhealthy, whereas we previously returned a 500 status. This change aims to maximize availability and utilize partially operational models, even when there’s a problem affecting overall model health.

Workspace admins are still notified when a model enters an unhealthy state. See the docs for information on fixing unhealthy models.