Introducing Baseten Hybrid: control and flexibility in your cloud and ours

A GIF of Baseten Hybrid: workloads run in your cloud, with optional spillover to Baseten cloud

Baseten Hybrid enables you to run inference in your own VPC, with optional spillover to Baseten Cloud for on-demand flex compute.

We’re excited to introduce early access to Baseten Hybrid, a multi-cloud solution that enables you to run inference in your cloud—with the ability to add flex capacity in ours.

We’ve seen it time and again: AI builders want to self-host some of their workloads for compliance reasons, but when traffic picks up, they need access to additional compute. With Baseten Hybrid, you have complete control over your policies and workloads while gaining true cloud elasticity: seamlessly spill over to Baseten Cloud, with zero engineering effort required.

Whether you’re using your cloud credits or negotiating your GPU commits with different cloud providers, you can bring them to Baseten while gaining the performance, scalability, and security we excel in.

No other solution on the market offers this level of multi-cloud flexibility.

How Baseten Hybrid works

Baseten Hybrid combines our two solutions for ML inference: Self-hosted and Cloud. Keep sensitive workloads securely on your own cloud to meet specific data residency requirements or fully utilize in-house resources. When you need extra compute, effortlessly spill over to ours. Specify regions to reduce latency and meet compliance, and do it all with zero effort to make your workloads compliant with different platforms.

Many companies label their multi-cloud solutions as "hybrid," but in reality, they just shuffle all of your workloads between public cloud providers. Instead of waiting for enough compute to run everything on one cloud, we dynamically distribute your workloads across any available capacity. You get true cloud elasticity and agnosticism, with zero time investment required; our infra takes care of that.

Gain full support and management from our team of expert engineers, with zero-downtime deployments and no-disruption infrastructure updates. Eliminate vendor lock-in while utilizing your existing GPU allocation, spend commit, and credits with cloud providers like AWS and GCP.

Hybrid cloud use cases

Gain early access to Baseten Hybrid for:

Meeting specific security needs. Self-host workloads according to strict data residency requirements, IP protection, requirements from customers, or regulations that mandate that inference be run on your cloud.
Multi-cloud elasticity and spillover. Enjoy the same experience of Baseten Cloud, while defining which workloads run where. Use your GPUs whenever compute is available, and spill over to our compute to maintain SLAs during traffic spikes.
Spend down cloud commits and avoid vendor lock-in. Our multi-cloud solution enables enterprises to flexibly use any cloud vendor while utilizing existing GPU allocation, spend commit, and credits.
Blazing-fast inference. Our ML infra is optimized for ultra-low-latency inference with elastic scale. We use the best inference optimization techniques (including TensorRT/TensorRT-LLM and vLLM) and model serving tooling, with instance- and network-level improvements for blazing-fast cold starts and end-to-end latency.
Compound AI systems. Baseten Self-hosted works with any setup, including compound AI. Build modular, scalable, efficient pipelines using multiple models and processing steps, while optimizing GPU utilization and reducing latency.

A gif showing the flow of data through a modularized workflow built with Baseten Chains

A speech-to-text Chain with custom autoscaling for each step.

Baseten Hybrid vs. Baseten Self-hosted vs. Baseten Cloud

Baseten Hybrid combines the best of Baseten Self-hosted and Baseten Cloud:

For a more detailed comparison, check out our guide to choosing the right deployment option for your inference workloads.

Our mission is to support companies with highly performant, reliable, and secure AI infrastructure customized to their needs. We built Baseten Hybrid to enable you to do inference in your own cloud and meet stringent security requirements—without compromising on performance. Future-proof your infrastructure against traffic bursts and maintain SLAs without additional capital expenditure.

If we can help you meet your security and compliance needs while providing necessary resources, scalability, and performance, get in touch!

Introducing Baseten Hybrid: control and flexibility in your cloud and ours

Authors

Last updated

Share

How Baseten Hybrid works

Hybrid cloud use cases

Baseten Hybrid vs. Baseten Self-hosted vs. Baseten Cloud

Related posts

Introducing GLM-5.2 Fast

Announcing our Series F

Welcome, Gabe Stern!

Explore Baseten today

Related posts

Introducing GLM-5.2 Fast

Announcing our Series F

Welcome, Gabe Stern!