OpenEvidence delivers instant, accurate medical information with Baseten

Company overview

OpenEvidence is a medical technology company that uses AI to sift through the massive amount of medical research available to healthcare professionals. Their AI-powered search platform provides fast, accurate, and up-to-date medical information at the point of care.

Inference challenges

When the OpenEvidence team first connected with Baseten, they were powering inference in-house. As the scale of the product expanded to hundreds of thousands of clinicians around the world, managing and optimizing inference infrastructure took more and more time away from focus on where OpenEvidence uniquely shines: building best-in-class tools for physicians.

To enable them to double down on their mission, the OpenEvidence team was looking for an inference partner that could:

Maintain reliable performance when traffic spiked, with redundancy in case of hardware failures or capacity constraints.
Save engineering time on model and infra management without OpenEvidence massively scaling its infrastructure teams.
Provide flexible access to compute without needing to sign multi-year contracts for more GPUs than they would consistently need.

"With Baseten, it’s so fluid to deploy and test models that we are freed up to build better and faster."
— Jagath Jai Kumar, Full Stack Engineer

Solutions

Baseten’s forward deployed and model performance teams worked as an extension of OpenEvidence’s engineering team to optimize their workloads, focusing on:

Providing flexible infrastructure with Multi-cloud Capacity Management (MCM) and on-demand compute access to reliably scale models across clouds and regions, handling traffic spikes and increased overall demand
Improving performance with Baseten Embeddings Inference (BEI) for the highest embeddings throughput and lowest latency on the market, and the Baseten Performance Client to unlock throughput constraints on the client side
Reducing management overhead with developer-friendly tooling and observability for fast experimentation and deployment cycles
Onboarding to Baseten Training for powerful, pre-configured training infrastructure and faster training cycles

“The reality is, making a deployment meet our standards isn’t easy. Baseten simplifies that process, allowing us to deploy much faster while still hitting our performance and accuracy goals. What used to take days now takes hours.”
— Jagath Jai Kumar, Full Stack Engineer

Results

With Baseten’s Inference-optimized Infrastructure and optimized Embedding Model Runtime (both part of the Baseten Inference Stack), OpenEvidence achieved higher throughput and cost-efficiency, reduced latency, and streamlined model deployment and management processes.

“Our team spent weeks researching and vetting inference providers. It was a thorough process and we confidently believe Baseten is a clear winner. Baseten has helped us abstract away so much of the complexity of AI model deployments and MLOps. On Baseten, things just work out of the box - this has saved us countless engineering hours. It’s made a huge difference in our productivity as a team - most of our engineers have experience now in training and deploying models on Baseten. Every time we start an ML project, we think about how quickly we can get things going through Baseten.”
— Eric Lehman, Head of Clinical NLP

Baseten’s Multi-cloud Capacity Management system ensures that OpenEvidence can scale efficiently even in the face of traffic spikes, hardware failure, or capacity constraints, powering a reliable product experience. As OpenEvidence grows (they now work with a doctor in every state and zip code in America), it’s also been able to flexibly access more compute on demand without locking into multi-year commitments with single cloud vendors.

By using Baseten, OpenEvidence achieved:

78% lower latency, from over 700 milliseconds to 160 milliseconds end-to-end
6x faster deployment processes, from multiple engineers spending hours on a deployment to one engineer spending less than an hour
8x+ reduction in infrastructure maintenance time overall

“With Baseten Embeddings Inference, we immediately saw 3x speed improvements. Doctors rely on speed when treating patients, and that improvement has been critical to our product experience. 160 millisecond latency is crazy.”
— Jagath Jai Kumar, Full Stack Engineer

As a result, OpenEvidence has been able to serve massive increases in usage without scaling its infrastructure team, while its team iterates faster and more effectively on the core product experience. Baseten now serves billions of requests per week for OpenEvidence.

“The deployment process used to take up so much of our time. Now, it’s as simple as a few commands, and we’re done. What used to take hours now takes less than one, and the reduced maintenance means we can focus on improving our core product.”
— Jagath Jai Kumar, Full Stack Engineer

Model training

The OpenEvidence research team shared that accessing GPUs through cloud providers was a major bottleneck for them, with engineers often waiting late into the night for a chance to train their models. Switching to Baseten Training completely transformed their experience.

The team found the platform “so easy” to use, eliminating the need for manual VM setup or complex GPU management. Even engineers with no prior ML background were able to train powerful models in under 30 minutes using Baseten’s sample scripts, achieving performance nearly identical to expert-built versions that would have taken weeks otherwise.

With Baseten Training, OpenEvidence successfully post-trained its model, making it 23x faster, and is projected to save $1.9 million in training costs. Beyond speed and cost savings, Baseten made cutting-edge model development accessible, streamlined, and highly scalable for their growing needs.

“Baseten helped us train models to be 23x faster and is projected to save us $1.9M—while making the process so easy that even non-ML engineers could get results in under 30 minutes.”
— Eric Lehman, Head of Clinical NLP

What’s next

Baseten’s team continues to partner with OpenEvidence to push the boundaries of embedding inference performance and model training. Tools like Baseten Embeddings Inference (BEI) and the Baseten Performance Client were developed with workloads like OpenEvidence’s in mind, and we’re constantly testing out new ways to increase throughput and lower latency—without sacrificing quality.

“Baseten supports billions of custom, fine-tuned LLM calls per week from OpenEvidence, serving high-stakes medical information to healthcare providers in every major healthcare facility in the country. If you see a doctor today, chances are that they are leveraging OpenEvidence for trustworthy, up-to-date medical information at their fingertips. Baseten’s tireless dedication to reliability and deep support at scale has proven up to the task of supporting this at times literally life-or-death mission.”
— Zachary Ziegler, Co-founder & CTO

Check out OpenEvidence’s website for fast, accurate, up-to-date medical information that helps medical professionals make confident, well-informed decisions at the point of care.