Store data in S3 with a data connection

While Baseten provides PostgreSQL tables for storing information from model runs, you may want to store your data in your own databases. Plus, for data ill-suited for relational databases, like large files and key-value pairs, you’ll want to use more appropriate technologies. With data connections, you can read and write from external sources directly from your Baseten account.

This tutorial covers writing files to Amazon S3 via a data connection. To get started, all you’ll need is:

Deploy the photo restoration starter application

If you don’t already have a copy of the photo restoration starter application deployed on your Baseten account, let’s take 30 seconds to get you a fresh copy. Head to the applications tab and click “Try it” under the “Photo restoration with GFP-GAN” starter app. 

Play around with it a bit to see how it works, as we’ll be extending its functionality after creating the data connection.

Establish a data connection

From the dropdown menu in the upper-left corner, select Data. On the data page, click on “Connections” and then click “Add connection.”

Name your connection and select S3 from the dropdown. This creates a connection to your account’s S3 resources. You’ll pick which bucket to use later, when you write code that accesses the data connection.

Enter your AWS credentials, save your data connection and return to the application builder.

Update application code

With the data connection all ready to go, it’s time to modify the code to access it. From the application builder, find the file main.py.

At the top of main.py, import the two libraries we’ll need. Both are already available, no need to install any packages.

import requests
import io

Replace the function `save_restored_image` in main.py with the following code. Make sure to replace `Your S3 connection` with the name you gave your data connection and `your-bucket-name` with your S3 bucket’s name.

def save_restored_image(block_input, env, context):
   predictions = block_input.get('predictions', [])
   if not predictions or (type(predictions) == list and len(predictions) == 0):
       return {
           'success': False
       }
   restored_url = predictions[0]
   aws_boto_client = context.client('Your S3 connection')
   response = requests.get(restored_url)
   s3_object = io.BytesIO(response.content)
   aws_boto_client.upload_fileobj(s3_object, 'your-bucket-name', restored_url.split('/')[-1])
   job = env.get('job')
   if job is not None:
       session = context.session
       # assign the restored image to the job previously recorded
       job.restored_url = restored_url
       session.add(job)
   return {'restored_url': restored_url, 'success': True}

This modified function adds these four lines to the middle:

aws_boto_client = context.client('Your S3 Connection')
response = requests.get(restored_url)
s3_object = io.BytesIO(response.content)
aws_boto_client.upload_fileobj(s3_object, 'your-bucket-name', restored_url.split('/')[-1])

First, it accesses the data connection, then brings in the restored image and converts it into a file-like object. The image was already stored in Baseten during the model invocation, which is why we use a get request to grab the image content. With the AWS Python SDK, it uploads the image to your S3 bucket as a file.

Head back to the application’s view and try running the photo restoration app. Within a few seconds, you’ll not only see the restored photo in the UI, but you’ll also see it in your S3 bucket!