Patreon saves nearly $600k/year in ML resources with Baseten

440+
hours of dev time saved per year
$600k
of resources saved per year
70%
savings in GPU cost

Background

Patreon is the ultimate platform for creators to offer memberships, provide exclusive access to their work, and cultivate deep connections with their community of supporters and fans. To date, Patreon supports over 250,000 creators with 8 million active monthly patrons. In 2021 alone, these creators earned over $1 billion from memberships.

To best serve their large-scale user base, Patreon deploys sophisticated ML solutions at record speed — thanks to Baseten’s ML infrastructure.

The Problem

As their user base scaled, the Patreon team needed to dedicate time to improving the customer experience of the platform — not hacking away at an unending ML infrastructure checklist. 

According to Nikhil Harithas, Senior ML Engineer at Patreon, handling ML models and gauging when to implement each one isn’t difficult. However, it’s still a time-consuming, cumbersome, costly process. Deciding which combination of components is correct isn’t a corner they can cut. 

From a business lens, Patreon needed to transcribe all the audio and video content uploaded by their creators – a substantial task in itself. To add to that workload, the team also needed to implement auto-generated closed captions in their native video product to improve accessibility. 

Then, OpenAI unveiled the Whisper neural net with a critical transcription model. Patreon tried using existing products that are for batch compute to serve the Whisper model for new use cases, but they were met with repeated challenges: prohibitively high costs, heavy lifting from their engineering team, or failure to meet Patreon’s data security needs. 

So, their central aim became minimizing time spent on ML infrastructure tasks by serving generative AI models scalably and cost-effectively. Enter: Baseten.

The Solution

Plugging Baseten into their toolset was a game-changer for the Patreon team. They were able to deploy and massively scale the Whisper transcription model, without needing help from infra-minded engineers. 

“The great thing about Baseten is they’re able to get something up and working for you really, really quickly. Seeing a substantial piece of work functioning within a couple of days is such an impressive, promising work rate.” 

The Result

Since teaming up, Baseten has supercharged Patreon’s ability to serve generative AI models at record speed and low ML infrastructure cost. This in turn has enabled their team to focus on what differentiates their product versus ML infrastructure. 

For any business, the most valued assets are people and their time. On that front, Nikhil points out that Baseten saved his team the time and cost of hiring 2 full-time ML engineers. In addition, he adds that Baseten is half the price of serving Whisper via OpenAI, while offering fine-grain control and security plus compliance guarantees. 

To add specifics, adopting Baseten resulted in three key metrics:

  • 440+ hours of dev time saved per year 

  • $600k of resources saved per year 

  • 70% savings in GPU cost

"Running our models on Baseten is more cost-effective than anything else in terms of vertical-specific solutions. It’s twice as cheap as the next cheapest solution. That plus their product improvement velocity and ticking the right compliance markers for us made working with Baseten the perfect partnership." 

The Future

Speed is critical to project iteration because it reveals the unknowns in the problem space you must work through. With Baseten, the Patreon team was able to ship a sophisticated ML model within days— far faster than they could have accomplished independently. 

Looking forward, Nikhil is excited for what that optimal speed of experimentation, iteration, and deployment will mean for his team’s ability to provide the best possible experience of Patreon.

Explore Baseten today

We love partnering with companies developing innovative AI products by providing the most customizable model deployment with the lowest latency.