Baseten Hybrid: control and flexibility in your cloud and ours
Get the performance of a managed service in your own VPC, with seamless overflow to Baseten Cloud.
High-performance inference with seamless overflow

Flex your cloud
Maintain SLAs during traffic spikes, avoid vendor lock-in, and leverage existing cloud credits with our effortless multi-cloud routing.

Cut latency
With rapid cold starts and tailored model performance, our customers achieve lower overall latency and faster time to first token.

Designed for compliance
Keep sensitive workloads in your VPC, and lean on the SOC 2 Type II, HIPAA, and GDPR compliance of Baseten Cloud.
Choosing Baseten Hybrid, Self-hosted, or Cloud
Baseten Hybrid | Baseten Self-hosted | Baseten Cloud | |
---|---|---|---|
Feature | |||
Data control | Full data control in your VPC; managed data security on Baseten Cloud | Full data control | Managed data security; we never store model inputs or outputs |
Data residency requirements | Region-locked data and deployments with multi-region support | Region-locked data and deployments | Multi-region support with global deployment options |
Compute capacity | Leverage existing resources or Baseten compute for overflow | Leverage existing in-house resources | Leverage on-demand compute with SOTA GPUs |
Cost efficiency | Use in-house compute whenever available for optimized costs | Utilize dedicated resources without extra spend on hardware | Gain cost-effective, on-demand compute |
Integration with internal systems | Custom or out-of-the-box integrations | Custom or out-of-the-box integrations | Easy integration via Baseten's ecosystem |
Performance optimization | SOTA on-chip model performance and low network latency | SOTA on-chip model performance and low network latency | SOTA on-chip model performance and low network latency |
Scalability | High, tailored scalability with flex capacity on Baseten Cloud | High, tailored scalability | High, flexible scaling options |
Security and compliance | Adhere to custom policies and our SOC 2 Type II, HIPAA, and GDPR compliance | Adhere to custom organizational policies | SOC 2 Type II certified, HIPAA compliant, and GDPR compliant by default |
Support and maintenance | Comprehensive support and managed services | Comprehensive support and managed services | Comprehensive support and managed services |
Utilization of existing cloud commits | Use credits or commits | Use credits or commits | Spend down existing cloud commits |
Feature
Data control
Data residency requirements
Compute capacity
Cost efficiency
Integration with internal systems
Performance optimization
Scalability
Security and compliance
Support and maintenance
Utilization of existing cloud commits
Get the best of Self-hosted and Cloud deployments
Having lifelike text-to-speech requires models to operate with very low latency and very high quality. We chose Baseten as our preferred inference provider for Orpheus TTS because we want our customers to have the best performance possible. Baseten’s Inference Stack allows our customers to create voice applications that sound as close to human as possible.
Troy Astorino,
Co-founder and CTO
Having lifelike text-to-speech requires models to operate with very low latency and very high quality. We chose Baseten as our preferred inference provider for Orpheus TTS because we want our customers to have the best performance possible. Baseten’s Inference Stack allows our customers to create voice applications that sound as close to human as possible.