Building a data labeler with Baseten

Baseten makes it easy to build user-facing applications with machine learning models. Working closely with users in our closed beta, we’ve repeatedly heard that users need to label data to train their models, often before they even have a model. Baseten helps here too! Data scientists can use Baseten to quickly build an API and craft UI for a user-facing data-labeling app, without first deploying a model with Baseten. In this post, we’ll build a simple image labeling app using Baseten.

Getting started

So you have an exciting new problem you want to use machine learning for. The Jupyter notebook is lined up, you even have a model architecture you want to experiment with. But…you don’t have any labeled data. Luckily you do have a great team who is happy to help you label the data.

Let’s walk through building a basic image labeler app in Baseten.

The labeling workflow for your team of labelers will look something like this:

  • Labelers will sign in to a Baseten-powered app
  • They’ll see images and options for different labels to apply
  • When labelers use the app, their labels will be saved, along with metadata around the identity of the labeler

As the app creator, you need to do the following in Baseten:

  • Seed the images that need to be labeled
  • Create an interface that shows the images (We call these interfaces “views” in Baseten)
  • Create an input that allows the labeler to select labels for each image that’s shown
  • Save the labels in a database
  • Add labelers as operators of your app
Setting up the data to be labeled

There are two common ways to seed data in Baseten. With both options, you can write queries against this data in the Baseten query editor and save these queries for use in worklets and views.

Option 1: Connect an external database to Baseten using a data connection. Baseten supports integrations with Postgres, Snowflake, MySQL, and BigQuery.

Option 2: Write a worklet to fetch data from a third party and save it to your Baseten provided Postgres database.

Next, create a query to retrieve this data. This query will be referenced to populate the data in the labeling view.

Creating a data labeling view

Now that the data is ready to be labeled, you can create the labeling view.

Let’s walk through building a simple labeling view, made up of three main components:

  • An image viewer
  • Basic navigation: (Submit button)
  • A label selector

First, create a new view and add an image component to it.

To show unlabeled images in the view, you need to associate the query you wrote to retrieve the data with the image component. Select Query as the image URI type and then select the query you just wrote from the dropdown.

Next, add a radio button component to allow the operator to select from a few different labels for the image. Or, use a select component to allow the operator to select from a longer list of labels for the image.

Lastly, add a Submit button using a button component that takes the operator’s label selections and calls a worklet to save the labels. This worklet doesn’t exist yet though, so we’ll come back to that after creating the worklet in the next step.

Saving the labeled data

To save the labeled data, create a new worklet. The worklet can be very simple, just one node that looks up the image to be labeled by the S3 URL, adds the label and metadata that identifies the labeler, and returns. As noted above, you can use SQLAlchemy bindings that are built into Baseten. The code looks something like this:


def save_label(node_input, context):
    image_url = node_input['image_url']
    meal = node_input.get('meal', 'breakfast')
    ambiance = node_input.get('ambiance', 'casual')
    labeler_name = node_input.get('labeler_name', 'Anonymous labeler')

    label = context.classes.RestaurantPhotoLabel(
        image_url=image_url,
        meal=meal,
        ambiance=ambiance,
        labeler_name=labeler_name
    )

    context.session.add(label)

    return {
        'success': True,
    }

Now that you have your worklet, you can go back to the Submit button on your data labeling view and have the button call this worklet.

Testing the end-to-end workflow

You can test what labelers will experience once the app is shared with them by previewing the view you just built.

Try out our demo data labeler app here!

This is a good starting point upon which you can iterate. You can add lots of additional functionality to your app in the future, including initial values to look at previously labeled images or a queue view to holistically understand which images have already been labeled and which images still need to be labeled. If you want to get even fancier, you can add an automatic training step that gets triggered every time n images get labeled.

Sharing the application with operators

Use Baseten's member management functionality to add the team members who will help label data. Assign them an operator role. Invited operators will receive an email prompting them to create an account and join your Baseten organization. Once they’ve created their account, they’ll be able to start labeling images in your data labeling app.

What’s next?

We’re here to answer any questions you have and would love to see what you build with Baseten. Send us an email!

If you don’t yet have access to Baseten, you can sign up for free here.