Changelog

Our latest product additions and improvements.

1238

Jun 1, 2023

One of the slowest parts of deploying a model—whether for the first time or as a cold start for a scaled-to-zero model service—is downloading the model weights. These files can exceed 10 GB for some common foundation models.

We developed a network accelerator to speed up model loads from common model artifact stores, including HuggingFace, CloudFront, S3, and OpenAI. Our accelerator employs byte range downloads in the background to maximize the parallelism of downloads.

The network accelerator speeds up downloading large model files by 500-600%.

The network accelerator uses a proxy and sidecar to speed up model downloads. These services run on our SOC 2 Type II certified and HIPAA compliant infrastructure. But, if you prefer to disable this network acceleration for your Baseten workspace, please contact our support team at support@baseten.co and we will disable the feature for your workspace.

May 26, 2023

Deploy the latest open-source models like WizardLM, Alpaca, Bark, Whisper, Stable Diffusion, and more from the refreshed and restocked model library.

Baseten's model library offers quick deployments of open-source foundation models

Previously, model library models deployed to your account used a shared instance of the model. This meant you couldn’t adjust resource configurations or view logs and metrics. Now, models from the model library are deployed directly to instances in your workspace, giving you full access to Baseten's model management features.

Existing shared instance model deployments will continue to operate as before, but all new deployments from the model library will use the standard deployment method.

For more on deploying and managing model library models, see the updated documentation.

May 2, 2023

The billing page in your workspace settings has two new capabilities: a model usage dashboard and invoice history panel.

The model usage dashboard breaks down your bill by model

Your model usage dashboard breaks down the billable time and total cost of each active model in your workspace. Usage is tracked by version for models with multiple versions, including deleted versions.

If your account has credits, they will be applied against your bill automatically and shown in the model usage dashboard.

The invoices panel tracks prior invoices

For previous billing periods, use the invoices panel to track and download prior invoices. Total bills shown are net of free credits.

Apr 28, 2023

Baseten has transitioned to purely usage-based pricing for all workspaces not on our Enterprise plan. There is no monthly or annual platform fee for workspaces on the default Startup plan.

Plus, we’re offering $30 in free credits for all new workspaces and to existing workspaces that transition over to the Startup plan.

Usage-based pricing

With usage-based pricing, you only pay for the time your model is deploying, active, or scaling down. Usage-based pricing also applies for fine-tuning runs.

Pricing depends on the instance type your model is running on. Use our pricing calculator to estimate the monthly cost of your workload.

You can monitor your usage from the billing tab in your workspace settings.

Baseten platform features

With usage-based pricing, workspaces on the Startup plan can access all model deployment and management features with no limit on the number of models and versions you can add to your Baseten account.

All workspaces offer:

We also offer an Enterprise plan for organizations that need dedicated support and more customizability. If you have any questions about pricing or which plan is best for your organization, please get in touch with us at support@baseten.co.

Mar 15, 2023

Old versus new model invocation syntax

We paid down some technical debt and, in doing so, removed a papercut from the Baseten and Truss developer experience.

It used to be that all model invocations had to be formatted as:

{
    "inputs": $MODEL_INPUT
}

We cleaned up the model packaging templates in Truss so that you can invoke your model on any JSON-serializable input. Now, pass your lists, strings, numbers, and more directly to your model. In other words, now you can simply call your model with $MODEL_INPUT.

This change only affects newly created model versions that use truss > 0.4.0 and baseten > 0.5.0, so your invocations of existing models won’t break. And you can always edit the predict() and preprocess() functions in your Truss’ model.py to change your model input format.

Feb 3, 2023

Real-world model deployment workflows can be a bit messy. The latest version of the Baseten Python client, version 0.3.0, gives you a cleaner path to production by introducing a live reload workflow to every model deployment. Live reload lets you test changes to your model in seconds rather than waiting for Docker to rebuild and your model to be redeployed.

Now, every deployed model starts as a draft model by default. Draft models support live reload for:

  • Updates to model serving code

  • Updates to required Python packages

  • Updates to required system packages

And other changes, like editing environment variables and updating your model binary, can still be made in the draft stage without incrementing a model version, though they will require waiting for a full redeployment.

Live reload saves time when iterating on draft models

Once you’re satisfied with your model, you can publish it to production resources. To publish your model, just pass publish=True as an argument to baseten.deploy(). You can also publish your model in the Baseten UI.

You can use this flag during your initial deployment to skip the draft model step if desired:

For a complete guide to deploying models, read our newly refreshed model deployment docs.

Jan 24, 2023

Model build and deployment logs

The only thing more frustrating than your code not working is not knowing why it isn't working. To make it easier to root-cause issues during model deployment and invocation, we separated build logs from deployment and prediction logs.

View active replicas

A graph showing the number of active replicas over time for a model

When enable autoscaling in your model resources, Baseten dynamically adjusts the number of replicas of your model to meet traffic demands. A new graph on the model page shows the number of active replicas for a given model over time.

Jan 13, 2023

Truss is an open-source library for packaging and serving machine learning models. In the latest minor version release, 0.2.0, we simplified the names of several core functions in Truss to create a cleaner interface. Here's an example of the new developer experience:

Truss code with new function names

Interface changes:

  • In the Python client, truss.create() replaces truss.mk_truss().

  • In the Python client, truss.load() replaces truss.from_directory().

  • In the Truss handle, truss.predict() offers a shorter alternative to truss.server_predict(). To use in place of truss.docker_predict(), pass the optional kwarg use_docker=True.

  • In the command-line interface, the behavior of truss predict has been updated to match the Python client. Previously, truss predict ran on Docker by default, which could be overriden with RUN_LOCAL=true. Now, truss predict runs without Docker by default, which can be overriden with USE_DOCKER=true.

These interface changes are intended to improve Truss' developer experience, not cause unnecessary trouble. As such, the old mk_truss() and from_directory() functions, while marked with a deprecation warning, will not be removed until the next major version update. And both server_predict() and docker_predict() will be supported in the Truss handle indefinitely.

For a complete list of Truss changes version-to-version, consult the Truss release notes.

Jan 5, 2023

The new user interface for models centers model versions. Previously in their own tab, model versions now have a dedicated sidebar to help you navigate different deployments of your model and review version statuses at a glance.

A screenshot of the updated model management UI

Here’s a summary of the other changes:

  • We’ve made it easier to promote a model version to Primary.

  • We’ve made it easier to activate & deactivate a model version.

  • We’ve moved the model version’s health metrics to the main tab.

  • We’ve moved the model readme and application list to its own tab.

  • We’ve added instructions for calling the primary version by default.

  • In all of this change, logs are right where you left them: the logs tab. But stay tuned for upcoming improvements to the logging experience!

Read the model management docs for even more details on using these refreshed pages.

Dec 23, 2022

By default, models deployed on Baseten run on a single instance with 1 vCPU and 2 GiB of RAM. This instance size is sufficient for some models and workloads, but demanding models and high-traffic applications need more resources to operate. Workspaces on any paid plan can now upgrade their own model resources and configure autoscaling.

The model resources configuration modal

As a user in a paid workspace, you configure the following resources:

  • Instance type: select among preconfigured levels of vCPUs and RAM

  • GPU instances: toggle instances to include a GPU for models that need it

  • Replica range: Set a minimum and maximum number of replicas to autoscale to handle load

The default 1x2 instance type is free, but higher resource configurations are subject to usage-based pricing, billed monthly. When you configure model resources, you can see the hourly rate for the selected instance types (instances charge by the minute) along with an estimated monthly spend based on replica count.

For more information, see the docs on model resource configuration.

Model resourcing is an undeniably complex topic. We’ve worked to balance giving you control with keeping things straightforward, but everyone’s needs are different. Please don’t hesitate to reach out to support@baseten.co with any questions about configuring model resources or usage-based billing (or anything else).

1238