Customer stories
We're creating a platform for progressive AI companies to build their products in the fastest, most performant infrastructure available.
What our customers are saying
See allYou guys have literally enabled us to hit insane revenue numbers without ever thinking about GPUs and scaling. We would be stuck in GPU AWS land without y'all. Truss files are amazing, y'all are on top of it always, and the product is well thought out. I know I ask for a lot so I just wanted to let you guys know that I am so blown away by everything Baseten.
I want the best possible experience for our users, but also for our company. Baseten has hands down provided both. We really appreciate the level of commitment and support from your entire team.
Nathan Sobo,
Co-founder
You guys have literally enabled us to hit insane revenue numbers without ever thinking about GPUs and scaling. We would be stuck in GPU AWS land without y'all. Truss files are amazing, y'all are on top of it always, and the product is well thought out. I know I ask for a lot so I just wanted to let you guys know that I am so blown away by everything Baseten.
I want the best possible experience for our users, but also for our company. Baseten has hands down provided both. We really appreciate the level of commitment and support from your entire team.
Baseten supports billions of custom, fine-tuned LLM calls per week from OpenEvidence, serving high-stakes medical information to healthcare providers in every major healthcare facility in the country. If you see a doctor today, chances are that they are leveraging OpenEvidence for trustworthy, up-to-date medical information at their fingertips. Baseten’s tireless dedication to reliability and deep support at scale has proven up to the task of supporting this at times literally life-or-death mission.
Baseten cut our P95 latency by 80% across the dozens of fine-tuned embedding models that power core features in Superhuman's AI-native email app. Superhuman is all about saving time. With Baseten, we're delivering a faster product for our customers while reducing engineering time spent on infrastructure.
Loïc Houssier,
CTO
Baseten supports billions of custom, fine-tuned LLM calls per week from OpenEvidence, serving high-stakes medical information to healthcare providers in every major healthcare facility in the country. If you see a doctor today, chances are that they are leveraging OpenEvidence for trustworthy, up-to-date medical information at their fingertips. Baseten’s tireless dedication to reliability and deep support at scale has proven up to the task of supporting this at times literally life-or-death mission.
Baseten cut our P95 latency by 80% across the dozens of fine-tuned embedding models that power core features in Superhuman's AI-native email app. Superhuman is all about saving time. With Baseten, we're delivering a faster product for our customers while reducing engineering time spent on infrastructure.
Customer Stories
Superhuman achieves 80% faster embedding model inference with Baseten
Superhuman cut P95 latency by 80% across dozens of custom embedding models in just one week after adopting Baseten Embedding Inference.
OpenEvidence delivers instant, accurate medical information with Baseten
OpenEvidence partners with Baseten for their inference infrastructure to focus on what they do best: making exceptional tools for physicians.
Latent delivers pharmaceutical search with 99.999% uptime on Baseten
Latent Health uses Baseten to power fast, reliable clinical AI.
Praktika delivers ultra-low-latency transcription for global language education with Baseten
With Baseten, Praktika delivers <300 milliseconds latency empowering language learners worldwide with a seamless conversational and learning experience.
Zed Industries serves 2x faster code completions with the Baseten Inference Stack
By partnering with Baseten, Zed achieved 45% lower latency, 3.6x higher throughput, and 100% uptime for their Edit Prediction feature.
Wispr Flow creates effortless voice dictation with Llama on Baseten
Wispr Flow runs fine-tuned Llama models with Baseten and AWS to provide seamless dictation across every application.
Rime serves speech synthesis API with stellar uptime using Baseten
Rime AI chose Baseten to serve its custom speech synthesis generative AI model and achieved state-of-the-art p99 latencies with 100% uptime in 2024
Bland AI breaks latency barriers with record-setting speed using Baseten
Bland AI leveraged Baseten’s state-of-the-art ML infrastructure to achieve real-time, seamless voice interactions at scale.
Baseten powers real-time translation tool toby to Product Hunt podium
The founders of toby worked with Baseten to deploy an optimized Whisper model on autoscaling hardware just one week ahead of their Product Hunt launch and had a top-three finish with zero downtime.
Custom medical and financial LLMs from Writer see 60% higher tokens per second with Baseten
Writer, the leading full-stack generative AI platform, launched new industry-specific LLMs for medicine and finance. Using TensorRT-LLM on Baseten, they increased their tokens per second by 60%.
Patreon saves nearly $600k/year in ML resources with Baseten
With Baseten, Patreon deployed and scaled the open-source foundation model Whisper at record speed without hiring an in-house ML infra team.