Customer stories
We're creating a platform for progressive AI companies to build their products in the fastest, most performant infrastructure available.
What our customers are saying
See allSahaj Garg,
Co-Founder and CTO
Inference for custom-built LLMs could be a major headache. Thanks to Baseten, we’re getting cost-effective high-performance model serving without any extra burden on our internal engineering teams. Instead, we get to focus our expertise on creating the best possible domain-specific LLMs for our customers.
We are constantly testing and iterating to build the best-in-class retrieval models. Baseten Training allows my team to focus fully on training without needing to worry about hardware and job orchestration. We moved all of our training jobs to Baseten so our researchers have more flexibility to build our foundational models. If we had this when we were first starting out it would have saved us a lot of time and headaches.
DJ Zappegos,
Engineering Manager
We are constantly testing and iterating to build the best-in-class retrieval models. Baseten Training allows my team to focus fully on training without needing to worry about hardware and job orchestration. We moved all of our training jobs to Baseten so our researchers have more flexibility to build our foundational models. If we had this when we were first starting out it would have saved us a lot of time and headaches.
Customer Stories

Wispr Flow creates effortless voice dictation with Llama on Baseten
Wispr Flow runs fine-tuned Llama models with Baseten and AWS to provide seamless dictation across every application.
Read case studyWispr Flow runs fine-tuned Llama models with Baseten and AWS to provide seamless dictation across every application.
Read case studyRime serves speech synthesis API with stellar uptime using Baseten
Rime AI chose Baseten to serve its custom speech synthesis generative AI model and achieved state-of-the-art p99 latencies with 100% uptime in 2024
Read case studyRime AI chose Baseten to serve its custom speech synthesis generative AI model and achieved state-of-the-art p99 latencies with 100% uptime in 2024
Read case study
Bland AI breaks latency barriers with record-setting speed using Baseten
Bland AI leveraged Baseten’s state-of-the-art ML infrastructure to achieve real-time, seamless voice interactions at scale.
Read case studyBland AI leveraged Baseten’s state-of-the-art ML infrastructure to achieve real-time, seamless voice interactions at scale.
Read case studyCustom medical and financial LLMs from Writer see 60% higher tokens per second with Baseten
Writer, the leading full-stack generative AI platform, launched new industry-specific LLMs for medicine and finance. Using TensorRT-LLM on Baseten, they increased their tokens per second by 60%.
Read case studyWriter, the leading full-stack generative AI platform, launched new industry-specific LLMs for medicine and finance. Using TensorRT-LLM on Baseten, they increased their tokens per second by 60%.
Read case studyPatreon saves nearly $600k/year in ML resources with Baseten
With Baseten, Patreon deployed and scaled the open-source foundation model Whisper at record speed without hiring an in-house ML infra team.
Read case studyWith Baseten, Patreon deployed and scaled the open-source foundation model Whisper at record speed without hiring an in-house ML infra team.
Read case study