Changelog
Our latest product additions and improvements.
š Remove default inputs dictionary in Truss and Baseten client
Remove default inputs dictionary in Truss and Baseten client
.png)
We paid down some technical debt and, in doing so, removed a papercut from the Baseten and Truss developer experience.
It used to be that all model invocations had to be formatted as:
We cleaned up the model packaging templates in Truss so that you can invoke your model on any JSON-serializable input. Now, pass your lists, strings, numbers, and more directly to your model. In other words, now you can simply call your model with $MODEL_INPUT.
This change only affects newly created model versions that use truss > 0.4.0 and baseten > 0.5.0, so your invocations of existing models wonāt break. And you can always edit the predict() and preprocess() functions in your Trussā model.py to change your model input format.
š Deploy models with live reload workflow
Deploy models with live reload workflow
Real-world model deployment workflows can be a bit messy. The latest version of the Baseten Python client, version 0.3.0, gives you a cleaner path to production by introducing a live reload workflow to every model deployment. Live reload lets you test changes to your model in seconds rather than waiting for Docker to rebuild and your model to be redeployed.
Now, every deployed model starts as a draft model by default. Draft models support live reload for:
- Updates to model serving code
- Updates to required Python packages
- Updates to required system packages
And other changes, like editing environment variables and updating your model binary, can still be made in the draft stage without incrementing a model version, though they will require waiting for a full redeployment.

Once youāre satisfied with your model, you can publish it to production resources. To publish your model, just pass publish=True as an argument to baseten.deploy(). You can also publish your model in the Baseten UI.
You can use this flag during your initial deployment to skip the draft model step if desired:
For a complete guide to deploying models, read our newly refreshed model deployment docs.
š² Fix issues faster with improved logs UI
Fix issues faster with improved logs UI

The only thing more frustrating than your code not working is not knowing why it isn't working. To make it easier to root-cause issues during model deployment and invocation, we separated build logs from deployment and prediction logs.
View active replicas

When enable autoscaling in your model resources, Baseten dynamically adjusts the number of replicas of your model to meet traffic demands. A new graph on the model page shows the number of active replicas for a given model over time.
š Truss 0.2.0: Improved developer experience
Truss 0.2.0: Improved developer experience
Truss is an open-source library for packaging and serving machine learning models. In the latest minor version release, 0.2.0, we simplified the names of several core functions in Truss to create a cleaner interface. Here's an example of the new developer experience:

Interface changes:
- In the Python client, truss.create() replaces truss.mk_truss().
- In the Python client, truss.load() replaces truss.from_directory().
- In the Truss handle, truss.predict() offers a shorter alternative to truss.server_predict(). To use in place of truss.docker_predict(), pass the optional kwarg use_docker=True.
- In the command-line interface, the behavior of truss predict has been updated to match the Python client. Previously, truss predict ran on Docker by default, which could be overriden with RUN_LOCAL=true. Now, truss predict runs without Docker by default, which can be overriden with USE_DOCKER=true.
These interface changes are intended to improve Truss' developer experience, not cause unnecessary trouble. As such, the old mk_truss() and from_directory() functions, while marked with a deprecation warning, will not be removed until the next major version update. And both server_predict() and docker_predict() will be supported in the Truss handle indefinitely.
For a complete list of Truss changes version-to-version, consult the Truss release notes.
š¦ Manage your models with updated UI
Manage your models with updated UI
The new user interface for models centers model versions. Previously in their own tab, model versions now have a dedicated sidebar to help you navigate different deployments of your model and review version statuses at a glance.

Hereās a summary of the other changes:
- Weāve made it easier to promote a model version to Primary.
- Weāve made it easier to activate & deactivate a model version.
- Weāve moved the model versionās health metrics to the main tab.
- Weāve moved the model readme and application list to its own tab.
- Weāve added instructions for calling the primary version by default.
- In all of this change, logs are right where you left them: the logs tab. But stay tuned for upcoming improvements to the logging experience!
Read the model management docs for even more details on using these refreshed pages.
š§ Configure your model resources
Configure your model resources
By default, models deployed on Baseten run on a single instance with 1 vCPU and 2 GiB of RAM. This instance size is sufficient for some models and workloads, but demanding models and high-traffic applications need more resources to operate. Workspaces on any paid plan can now upgrade their own model resources and configure autoscaling.Ā

ā
As a user in a paid workspace, you configure the following resources:
- Instance type: select among preconfigured levels of vCPUs and RAM
- GPU instances: toggle instances to include a GPU for models that need it
- Replica range: Set a minimum and maximum number of replicas to autoscale to handle load
The default 1x2 instance type is free, but higher resource configurations are subject to usage-based pricing, billed monthly. When you configure model resources, you can see the hourly rate for the selected instance types (instances charge by the minute) along with an estimated monthly spend based on replica count.
For more information, see the docs on model resource configuration.
Model resourcing is an undeniably complex topic. Weāve worked to balance giving you control with keeping things straightforward, but everyoneās needs are different. Please donāt hesitate to reach out to support@baseten.co with any questions about configuring model resources or usage-based billing (or anything else).
š» Enable live reload with draft models
Enable live reload with draft models
Slow dev loops break flow state and make for a frustrating experience. And for data scientists, slow dev loops make all but the most essential deployment workflows too expensive and time consuming to even consider.
To speed up dev loops in model deployment, Baseten is introducing draft models. For more, read our blog post on using this feature to accelerate your workflows.
By default, the baseten.deploy() command deploys your model as a draft. Hereās a simple example:
When you're ready to publish your model, just pass publish=True to the same deploy command.
To get started with draft models, read the docs or try our demo notebook!
š® Weekly round-up: Deploy Flan-T5 XL on Baseten
Deploy Flan-T5 XL on Baseten
Flan-T5 XL is an open-source large language model developed by Google. Flan-T5 is an instruction-tuned model, meaning that it exhibits zero-shot-like behavior when given instructions as part of the prompt. You can learn more about instruction tuning on Google's blog.
The model also comes with a starter app so that you can experiment with instruction tuning. You can give it a try here!

If you want to fine-tune and build with state of the art models like Flan T5, check out what we are working on with Blueprint and join the waitlist for early access.
šļø Weekly round-up: Live reload Truss in Docker dev environment
Live reload Truss in Docker dev environment
The latest release of Truss, version 0.1.5, introduces a live reload mechanic to improve developer velocity when working with Docker.
Docker is great because it makes your development environment nearly identical to your production environment. But that comes at the expense of rebuilding your environment when you make changes to your Truss. With live reload, you can now make changes to your model code and keep the same Docker container running, which can save several minutes every time you change your code.
To enable this feature, install the latest version of Truss and set live_reload = True in your Truss config file.
šŗWeekly round-up: Python environment per application
Python environment per application
Until today, applications on your Baseten account shared a single Python environment. Now, you can install Python packages from PyPi or system packages like ffmpeg on an app-by-app basis. Whatās more, draft and production versions of the same application also run in different environments.
This means that you can:
- Install or upgrade a Python package without affecting applications in production
- Run different versions of the same package in different applications
- Publish and manage your code and dependencies in sync
Basetenās application builder is designed for making apps to handle real production use cases, and this change gives you an even more flexible, robust developer experience.
š The pumpkin patch
This weekās small-but-mighty changes to bring more magic to your models!
Use more keyboard shortcuts: Accelerate your workflows with a dozen new view builder keyboard shortcuts, listed here. My favorite: nudge components around the view with arrow keys.
āCopy-and-paste improvements: Multiselect and copy-and-paste between views now work together, and pasting multiple components preserves their relative layout.
āSpecial issue: Deploy MLflow models on Baseten
Deploy MLflow models on Baseten
Baseten now supports MLflow models via Truss. MLflow is a popular library for model experimentation and model management with over ten million monthly downloads on PyPi. With MLflow, you can train a model in any framework (PyTorch, TensorFlow, XGBoost, etc) and access features for tracking, packaging, and registering your model. And now, deploying to Baseten is a natural extension of MLflow-based workflows.
Deploying an MLflow model looks a bit like this:
For a complete runnable example, check out this demo on Google Colab.
Baseten uses MLFlow's pyfunc module to load the model and packages it via Truss. To learn more about packaging MLflow models for deployment, consult the Truss documentation on MLflow.
āSpecial issue: Use Stable Diffusion instantly as an API
Deploy Stable Diffusion instantly
What if instead of painstakingly configuring Stable Diffusion to run locally or paying for expensive cloud GPUs, you could deploy it in a couple of clicks? And better still, it would be instantly available as an authenticated API?
Baseten has added Stable Diffusion to our model library so you can do exactly that. Simply deploy the pre-trained model on your Baseten account then use the starter app or built-in API to use the model.

Deploy Stable Diffusion today and build awesome tools for generating everything from avatars to Zoom backgrounds.
š²š½ Weekly round-up: Wait less. Build more!
Explore models with guidance
Often, the hardest part of a project is getting started. And when youāre getting started with an unfamiliar model, there are a few things you want to do: try it on a variety of inputs, parse its output to a usable form, and tweak its configuration to meet your needs.

Basetenās library of models now features comprehensive updated READMEs for many of our most popular models, with more coming soon.Ā
Load Baseten up to ten times faster
Baseten power users are filling their workspaces with powerful models and dynamic apps. And we found that as the number and size of deployed systems grew on an account, load times shot way up. So we refactored the user interface to load much faster.
But saying āthe website is way fasterā is hardly useful information. Hereās a table showing how much loading time is saved:

Saving time on your MLOps isnāt just about removing clunky hours-long deploy processes. We also care about saving you seconds at the margin.
šŖµ Weekly round-up: Deploy OpenAI Whisper instantly
Deploy OpenAI Whisper instantly
We added Whisper, a best-in-class speech-to-text model, to our library of pre-trained models. That means you can deploy Whisper instantly on your Baseten account and build applications powered by the most sophisticated transcription model available.

You can deploy Whisper from its model page in the Baseten app. Just sign in or create an account and click āDeploy.ā The model and associated starter app will be added to your workspace instantly. Or, try the model first with our public demo.
Review improved model logs
In a comprehensive overhaul, we made model logs ten times shorter but way more useful. Hereās what we changed:
- Build logs are now separated into steps for easier skimming
- Model deployment logs are surfaced just like build logs
- Model OOMs are now reported
- Many extraneous log statements have been deleted
OOM logging is a particularly important improvement. An OOM, or out-of-memory error, is a special lifecycle event that we monitor for on Kubernetes. This error means that the model is too big for the infrastructure provisioned for it. Existing logging solutions donāt capture these errors, resulting in frustrating debugging sessions, so we built a special listener to let you know about OOMs right away.

š° Weekly round-up: Select multiple components in the view builder
Select multiple components in the view builder
In the view builder, you can now select multiple components at the same time and move them as a single block. You can also bulk duplicate and bulk delete multiple selected components.

To select multiple components, either use Command-click on each component you wish to select, or drag your cursor over an area of the screen to select everything within its path.
š The pumpkin patch
This weekās small-but-mighty changes to bring more magic to your models!
Set image empty state: You can now specify custom text to appear in an image component when no image is present.
ā

Remove canvas frame: You can hide the canvas frame in your application to give the published views a consistent all-white background.
