Customer stories

Performance-obsessed companies trust Baseten to run their mission-critical workloads.

Get started

Talk to an engineer

How Writer helps businesses transform with AI

Trusted by top engineering and machine learning teams

Your favorite companies have the fastest inference. Here's how.

How Gamma makes building presentations criminally fun

How Rime is on a mission to make voice AI more human

Building AI Agents, Open Code, and Open Source Coding with Dax Raad

How Writer helps businesses transform with AI

Filter by

Latent delivers pharmaceutical search with 99.999% uptime on Baseten

Latent Health uses Baseten to power fast, reliable clinical AI.

Read case study

Rime serves speech synthesis API with stellar uptime using Baseten

Rime AI chose Baseten to serve its custom speech synthesis generative AI model and achieved state-of-the-art p99 latencies with 100% uptime in 2024

Read case study

Bland AI breaks latency barriers with record-setting speed using Baseten

Bland AI leveraged Baseten’s state-of-the-art ML infrastructure to achieve real-time, seamless voice interactions at scale.

Read case study

Zed Industries serves 2x faster code completions with the Baseten Inference Stack

By partnering with Baseten, Zed achieved 45% lower latency, 3.6x higher throughput, and 100% uptime for their Edit Prediction feature.

Read case study

Custom medical and financial LLMs from Writer see 60% higher tokens per second with Baseten

Writer, the leading full-stack generative AI platform, launched new industry-specific LLMs for medicine and finance. Using TensorRT-LLM on Baseten, they increased their tokens per second by 60%.

Read case study

Wispr Flow creates effortless voice dictation with Llama on Baseten

Wispr Flow runs fine-tuned Llama models with Baseten and AWS to provide seamless dictation across every application.

Read case study

Patreon saves nearly $600k/year in ML resources with Baseten

With Baseten, Patreon deployed and scaled the open-source foundation model Whisper at record speed without hiring an in-house ML infra team.

Read case study

Baseten powers real-time translation tool toby to Product Hunt podium

The founders of toby worked with Baseten to deploy an optimized Whisper model on autoscaling hardware just one week ahead of their Product Hunt launch and had a top-three finish with zero downtime.

Read case study

Praktika delivers ultra-low-latency transcription for global language education with Baseten

With Baseten, Praktika delivers <300 milliseconds latency empowering language learners worldwide with a seamless conversational and learning experience.

Read case study

From datasets to deployed models: How Oxen helps companies train faster

Oxen AI is a platform focused on helping companies accelerate the path from raw datasets to fine-tuned, production-ready AI models. They partner with Baseten to handle the infrastructure layer while they innovate on the customer facing side.

Read case study

Superhuman achieves 80% faster embedding model inference with Baseten

Superhuman cut P95 latency by 80% across dozens of custom embedding models in just one week after adopting Baseten Embedding Inference.

Read case study

How Gamma makes building presentations criminally fun

How Rime is on a mission to make voice AI more human

Building AI Agents, Open Code, and Open Source Coding with Dax Raad

How Writer helps businesses transform with AI

OpenEvidence delivers instant, accurate medical information with Baseten

OpenEvidence partners with Baseten for their inference infrastructure to focus on what they do best: making exceptional tools for physicians.

Read case study

Scaled Cognition offers ultra-fast AI agents you can trust

Scaled Cognition cut latency 40% and hit <120 ms TTFT with Baseten’s hybrid autoscaling—powering ultra-fast, enterprise-grade agentic workflows.

Read case study

With the launch of Brain MAX we’ve discovered how addictive speech-to-text is - we use it every day and want it everywhere. But it’s difficult to get reliable, performant, and scalable inference. Baseten helped us unlock sub-300ms transcription with no unpredictable latency spikes. It’s been a game-changer for us and our users.
I want the best possible experience for our users, but also for our company. Baseten has hands down provided both. We really appreciate the level of commitment and support from your entire team.
Nathan Sobo, Co-founder

Nathan Sobo,
Co-founder
With the launch of Brain MAX we’ve discovered how addictive speech-to-text is - we use it every day and want it everywhere. But it’s difficult to get reliable, performant, and scalable inference. Baseten helped us unlock sub-300ms transcription with no unpredictable latency spikes. It’s been a game-changer for us and our users.
I want the best possible experience for our users, but also for our company. Baseten has hands down provided both. We really appreciate the level of commitment and support from your entire team.

Explore Baseten today

Start deploying

Talk to an engineer