By default, models deployed on Baseten run on a single instance with 1 vCPU and 2 GiB of RAM. This instance size is sufficient for some models and workloads, but demanding models and high-traffic applications need more resources to operate. Workspaces on any paid plan can now upgrade their own model resources and configure autoscaling.
As a user in a paid workspace, you configure the following resources:
Instance type: select among preconfigured levels of vCPUs and RAM
GPU instances: toggle instances to include a GPU for models that need it
Replica range: Set a minimum and maximum number of replicas to autoscale to handle load
The default 1x2 instance type is free, but higher resource configurations are subject to usage-based pricing, billed monthly. When you configure model resources, you can see the hourly rate for the selected instance types (instances charge by the minute) along with an estimated monthly spend based on replica count.
For more information, see the docs on model resource configuration.
Model resourcing is an undeniably complex topic. We’ve worked to balance giving you control with keeping things straightforward, but everyone’s needs are different. Please don’t hesitate to reach out to firstname.lastname@example.org with any questions about configuring model resources or usage-based billing (or anything else).