Machine learning has matured at mind-boggling speed. The decrease in cost of compute has enabled innovative model architectures that in turn have resulted in powerful state-of-the-art models with real utility. We are no longer technically limited by what machine learning can achieve.
But in my experience, there’s still a huge gap between being able to answer a business problem with ML and actually implementing an ML solution. We saw this first-hand at Yelp, Gumroad, and Clover Health, and we’ve heard it repeatedly from practitioners at organizations of every shape and size.
In 2022, the job of a Data Scientist is like being a Product Owner, Developer, Designer, and Data Scientist all in one. First you need to find use cases. So you partner with stakeholders, gather requirements, and prioritize accordingly. Then comes the model building — preparing data, training a model, and iterating to maximize performance. More than enough to keep your hands full, right?
But models sitting on your local machine aren’t enough. They need to be integrated back into the business: served behind scalable APIs, given read and write access to databases and, if you’re building for business users, delivered alongside a UI that lets them see and interact with your predictions.
This staggering amount of work is best summarized in this diagram by Lj Miranda, an ML Engineer at spaCy:
Shipping an ML solution from start to finish involves two loops: model development (the right loop) and model delivery (the left loop). Model development is typically well-understood by Data Scientists. But as we’ve mentioned, a model alone isn’t sufficient. It needs to be packaged and integrated as a software component. Enter model delivery, a completely distinct set of tasks that requires software engineering resources or know-how.
In short, being a Data Scientist is a hard job. That’s why best-in-class teams at well-resourced organizations have built entire ML platform teams to spread the burden (a recent conversation with an ML lead at a FAANG revealed they had over 100 engineers to support their 100 scientists within just one product area). But for the majority of data science teams, this approach is both prohibitively expensive and hard to hire for.
Instead, data science teams take it upon themselves to act as both scientist and engineer. For these folks, we found that model development could be done relatively quickly — less than 4 weeks in many cases. But model delivery could take an additional 8 to 16 weeks.
Although ML has matured rapidly, it’s clear that the supporting infrastructure and tooling needed to actually implement it is still abysmally immature.
So, I joined forces with my friends and former teammates Amir and Philip and began to wonder: what if we could productize all the infrastructure, server-side, and front-end work involved in model delivery so data scientists can focus on doing what they love: training models and solving business problems?
With an amazing group of early customers, including Data Scientists and ML Engineers at Patreon, Pipe, SIL, and Primer, we spent the past two years building Baseten, an ML Application Builder for Data Scientists. Baseten makes it easy to deploy machine learning models and serve them into new and existing business processes with scalable APIs and interactive applications without needing to learn anything about containers, Flask, and React.
How does it work?
1. Serve models easily and quickly
We believe Data Scientists shouldn’t have to spend countless hours wrangling Docker and AWS to get their models behind scalable, robust APIs. We’ve built simple APIs that allow you to get models served and an intuitive UI to manage, monitor, and configure infrastructure when needed.
Baseten supports most major modeling libraries including scikit-learn, Pytorch, and Tensorflow. If you’re building something with a custom framework, our custom model deployment mechanism allows you to deploy infinitely complex mechanisms.
2. Integrate with other services and data stores
Most models need some pre-processing and post-processing at the time of inference, and additional backend services get built to call predictions and push predictions to other data stores and tools. All this code can now be written, tested and run with Baseten without worrying about standing up servers and setting up RESTful APIs. Users have control over both the system and Python environments, and the endpoints are set up to scale horizontally.
3. Design interactive views for business users
4. Ship full-stack applications
Ultimately, our goal is to empower Data Scientists to ship faster. By drastically reducing the barriers to getting models out of notebooks and into the hands of users, the rate of iteration drastically increases. Which in turn helps the business derive more value from machine learning.
Our early customers are already leveraging Baseten for a wide variety of applications. Just to name a few:
- Data labeller for user-generated content: Patreon’s Trust & Safety team label user-generated content using a web app built on Baseten.
- User verification application: Primer built a full-stack application so their team could review flagged sign-ups and prevent bad actors from joining.
- Diagnostic suite to assess translation quality: SIL’s data science team use Baseten to serve their impressive set of models that assess factors like readability, comprehensibility, and similarity in a translation.
The above are just a few examples. Really, any ML model that needs to interact with people or other services is a fantastic fit for Baseten.
Try it out today
After months of iterating with early users, we’re excited to finally share Baseten’s public beta to allow all data scientists to deploy models and build full-stack applications.
We hope you try it out — sign up today!
Fine-tune FLAN-T5 on Blueprint today!
You can now fine-tune FLAN-T5, an instruction-tuned text-to-text transformer model developed by Google, on Blueprint!