Baseten Hybrid: control and flexibility in your cloud and ours
Get the performance of a managed service in your own VPC, with seamless overflow to Baseten Cloud.
High-performance inference with seamless overflow
Flex your cloud
Maintain SLAs during traffic spikes, avoid vendor lock-in, and leverage existing cloud credits with our effortless multi-cloud routing.
Cut latency
With rapid cold starts and tailored model performance, our customers achieve lower overall latency and faster time to first token.
Designed for compliance
Keep sensitive workloads in your VPC, and lean on the SOC 2 Type II, HIPAA, and GDPR compliance of Baseten Cloud.
Choosing Baseten Hybrid, Self-hosted, or Cloud
Baseten Hybrid | Baseten Self-hosted | Baseten Cloud | |
---|---|---|---|
Feature | |||
Data control | Full data control in your VPC; managed data security on Baseten Cloud | Full data control | Managed data security; we never store model inputs or outputs |
Data residency requirements | Region-locked data and deployments with multi-region support | Region-locked data and deployments | Multi-region support with global deployment options |
Compute capacity | Leverage existing resources or Baseten compute for overflow | Leverage existing in-house resources | Leverage on-demand compute with SOTA GPUs |
Cost efficiency | Use in-house compute whenever available for optimized costs | Utilize dedicated resources without extra spend on hardware | Gain cost-effective, on-demand compute |
Integration with internal systems | Custom or out-of-the-box integrations | Custom or out-of-the-box integrations | Easy integration via Baseten's ecosystem |
Performance optimization | SOTA on-chip model performance and low network latency | SOTA on-chip model performance and low network latency | SOTA on-chip model performance and low network latency |
Scalability | High, tailored scalability with flex capacity on Baseten Cloud | High, tailored scalability | High, flexible scaling options |
Security and compliance | Adhere to custom policies and our SOC 2 Type II, HIPAA, and GDPR compliance | Adhere to custom organizational policies | SOC 2 Type II certified, HIPAA compliant, and GDPR compliant by default |
Support and maintenance | Comprehensive support and managed services | Comprehensive support and managed services | Comprehensive support and managed services |
Utilization of existing cloud commits | Use credits or commits | Use credits or commits | Spend down existing cloud commits |
Feature
Data control
Data residency requirements
Compute capacity
Cost efficiency
Integration with internal systems
Performance optimization
Scalability
Security and compliance
Support and maintenance
Utilization of existing cloud commits
Get the best of Self-hosted and Cloud deployments
Baseten supports billions of custom, fine-tuned LLM calls per week from OpenEvidence, serving high-stakes medical information to healthcare providers in every major healthcare facility in the country. If you see a doctor today, chances are that they are leveraging OpenEvidence for trustworthy, up-to-date medical information at their fingertips. Baseten’s tireless dedication to reliability and deep support at scale has proven up to the task of supporting this at times literally life-or-death mission.
Baseten cut our P95 latency by 80% across the dozens of fine-tuned embedding models that power core features in Superhuman's AI-native email app. Superhuman is all about saving time. With Baseten, we're delivering a faster product for our customers while reducing engineering time spent on infrastructure.
Loïc Houssier,
CTO
Baseten supports billions of custom, fine-tuned LLM calls per week from OpenEvidence, serving high-stakes medical information to healthcare providers in every major healthcare facility in the country. If you see a doctor today, chances are that they are leveraging OpenEvidence for trustworthy, up-to-date medical information at their fingertips. Baseten’s tireless dedication to reliability and deep support at scale has proven up to the task of supporting this at times literally life-or-death mission.
Baseten cut our P95 latency by 80% across the dozens of fine-tuned embedding models that power core features in Superhuman's AI-native email app. Superhuman is all about saving time. With Baseten, we're delivering a faster product for our customers while reducing engineering time spent on infrastructure.