Oct 31, 2023

Overhauled model management experience

We've made some big changes to the model management experience to clarify the model lifecycle and better follow concepts you're already familiar with as a developer. These changes aren't breaking – they'll just make it easier for you to deploy and serve your models performantly, scalably, and cost-effectively.

If you’ve already deployed models on Baseten and know your way around the old model overview pages, you’ll notice some changes including:

Deployments: Draft and primary versions are now development and production deployments.
Calling your model: You can now easily test your model in Baseten by calling it from the overview page.
New predict endpoints: We've added new predict endpoints with a simpler response format. You will need to change the way you parse your model output if you decide to switch to the new endpoints.
Observability and control: The model overview page includes additional model metadata and new actions you can take on each deployment.

Deployments

We’ve moved away from semantic versioning in favor of deployments with deployment IDs.

A development deployment in the process of being promoted to production

The new model overview page highlights two special deployments:

Development deployment (formerly draft version): This deployment is designed for quick iteration and testing, with live reload so you can patch changes onto the model server while it runs. It’s limited to a maximum of one replica and always scales to zero after inactivity.
Production deployment (formerly primary version): Promote your development deployment to production when you’re ready to use your model for a production use case. The production deployment and all other published deployments have access to full autoscaling to meet the demands of high and variable traffic.

The deployments section of the model overview page

All of your deployments are listed beneath the development and production deployments:

For workspaces with multiple users, you now get visibility into who in your workspace created each model deployment.
You can set different autoscaling settings for each deployment and get at-a-glance visibility into how many replicas are scaled up at a given time.
We’ve added new actions to model deployments. In addition to activating, deactivating, and deleting deployments, click into the action menu to:
- Wake the deployment if it’s scaled to zero.
- Download the deployment’s Truss.
- Stop an in progress deployment.
- Manage the deployment’s autoscaling settings.

Calling your model

The call model modal where you can find deployment endpoints and call the model

The predict endpoint format has changed so that you can call the current development or production deployment without referencing a specific deployment ID. The old endpoints will continue working so you can continue using them if you’d like. Here’s how the new endpoints are formatted:

To call the current production deployment: https://model-<model-id>.api.baseten.co/production/predict
To call the current development deployment: https://model-<model-id>.api.baseten.co/development/predict
To call another deployment: https://model-<model-id>.api.baseten.co/deployment/<deployment-id>/predict

With the new model endpoints, we’ve also changed the output format of the model response. New responses are no longer wrapped in a JSON dictionary, which removes a step in parsing model output. You only need to change the way you parse your model output if you switch to these new endpoints.

Call a model deployment from within Baseten

We've also added new functionality to the "Call model" dialog:

You can generate an API key in one click instead of going to your account settings.
You can now test your model by calling it from the model overview page within Baseten. Model library models even come with sample inputs you can call right away.

Observability and control

You’ll notice a handful of other changes as you explore your model overview pages:

Metrics have moved to their own tab. The overview tab still provides high-level visibility into the number of calls and median response time in the last hour. Click over to the metrics tab to dive deeper into model traffic, performance, and GPU usage.
When choosing an instance type you can choose to view pricing per minute, per hour, or per day.
You can quickly see a running total of how much you’ve spent on the model this billing period and then drill deeper into usage costs by deployment and instance type.

The model metrics tab showing end-to-end response time in the last hour

For more on model management features on Baseten, check out all-new docs for:

Publishing to production and the model lifecycle
API endpoints for inference and more
Autoscaling settings and inference types
Model metrics and model health

That’s it for now. We’re eager to know what you think. Please reach out to us with questions and feedback on these changes – we’re at support@baseten.co.