Deploying Stable Diffusion on Baseten using Truss
Stable Diffusion is an open-source image generation model developed by Stability AI. It goes image for image with Dall·E 2, but unlike Dall·E’s proprietary license, Stable Diffusion’s usage is governed by the CreativeML Open RAIL M License. While this dramatically lowers the cost of using the model, it still requires some technical aptitude to get it running, not to mention a high-end GPU.
I wanted to give stable diffusion a try, but didn’t want to send Nvidia a thousand dollars for a shiny new 12-gig 3080Ti to run it on. Plus, I wanted my colleagues to be able to generate images too. So I set out to deploy the model on Baseten.
Spoiler alert: it worked. Here’s an app that you can use to interact with the deployed model.
If you’re curious about the process of deploying a cutting-edge model on Baseten, read on and I’ll walk you through the process step by step.
Prerequisites
While Stable Diffusion is a good deal simpler to run than many other big models, it still takes a few resources:
If you’re deploying the model to Baseten, you’ll need both a Baseten API key and GPU access in your workspace. If your workspace uses a paid plan, contact us to get GPU access.
If you want to serve the model locally or on another platform, you’ll need access to a machine with a CUDA-capable GPU
A Hugging Face access token with access to the Stable Diffusion model (once you have created the access token, visit that link and register to use the model, access will be granted instantly)
We don’t need to download the model weights directly or otherwise use the build instructions from the stable diffusion GitHub repository as we’ll get everything we need through the Hugging Face model.
Packaging the model
I used Truss, Baseten’s open source package for serving and deploying models, to prepare the model for production.
To create the Truss, I opened up Terminal and ran:
truss init ./stable-diffusion
This created the folder structure that I used to package up the model. I edited two files from inside this folder: config.yaml
and model/model.py
.
In the config file, I made three adjustments. First I added package dependencies:
requirements:
- diffusers
- transformers
- torch
Then I made sure to configure the Truss to use a GPU. Stable diffusion requires a GPU during inference, not just training, to generate images.
resources:
cpu: 500m
memory: 512Mi
use_gpu: true
Finally, I used Truss’ secrets management feature to make sure that my model knows to look for the Hugging Face access token on Baseten.
Then it was time to work on the model code itself. Truss allows you to quickly define a model/model.py
file and then turns that into a Docker image that contains a server that hosts your model. Here is the full model/model.py
file, which I’ll explain in detail below.
import torch
from torch import autocast
import base64
from io import BytesIO
from typing import Dict, List
from diffusers import StableDiffusionPipeline
class Model:
def __init__(self, **kwargs) -> None:
self._data_dir = kwargs["data_dir"]
self._config = kwargs["config"]
self._secrets = kwargs.get("secrets")
self._model = None
self.device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
def load(self):
self._model = StableDiffusionPipeline.from_pretrained(
"CompVis/stable-diffusion-v1-4",
revision="fp16",
torch_dtype=torch.float16,
use_auth_token=self._secrets["hf_access_token"],
)
# push model onto gpu where possible
self._model = self._model.to(self.device)
# helper function to convert to b64
def convert_to_b64(self, image):
buffered = BytesIO()
image.save(buffered, format="JPEG")
img_b64 = base64.b64encode(buffered.getvalue()).decode("utf-8")
return img_b64
def predict(self, request: Dict) -> Dict[str, List]:
print(self.device)
response = {}
response["predictions"] = []
inputs = request["inputs"]
prompts = list(map(lambda x: x["prompt"], inputs))
# run inference over our prompts and pull out the resulting image
results = []
with autocast(self.device.type):
for prompt in prompts:
image = self._model(prompt)["sample"][0]
results.append(image)
# convert images to b64
b64_results = list(map(lambda x: self.convert_to_b64(x), results))
response["predictions"] = b64_results
return response
Let's break that big chunk of code down.
In the init
function, the device
parameter lets the model access a GPU when available. The function also loads config information and secrets.
self.device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
The load
function loads the stable diffusion model. This uses your Hugging Face access token from the prerequisites section to access the weights. The model is then pushed the the GPU, when available.
def load(self):
self._model = StableDiffusionPipeline.from_pretrained(
"CompVis/stable-diffusion-v1-4",
revision="fp16",
torch_dtype=torch.float16,
use_auth_token=self._secrets["hf_access_token"],
)
# push model onto gpu where possible
self._model = self._model.to(self.device)
A helper function converts the Image object to a base64 string.
def convert_to_b64(self, image):
buffered = BytesIO()
image.save(buffered, format="JPEG")
img_b64 = base64.b64encode(buffered.getvalue()).decode("utf-8")
return img_b64
Finally, the predict
function actually runs interference on the model. The predict
function parses out the prompt from the request, runs the prompt through the model, and does some post-processing on the resulting Image object. After running the result through the base64 helper function, it returns a response with the encoded image. Here's the line that actually invokes the model:
image = self._model(prompt)["sample"][0]
Stable Diffusion uses Hugging Face and PyTorch, which are both supported frameworks on Truss and Baseten. So it only takes a few lines of code to load and run inference on the model in production.
Serving and deployment
Before deploying the model, I served it locally to make sure everything was working as expected. I used the Truss library in a Jupyter notebook to invoke the model:
import truss
scaffold = truss.from_directory("./stable-diffusion")
scaffold.server_predict({"inputs" : [{"prompt" : "man on moon"}]})
Satisfied that it was working, I deployed it to Baseten with just a couple lines of code:
import baseten
scaffold = truss.from_directory("./stable-diffusion")
baseten.login("paste your Baseten API key")
baseten.deploy_truss(scaffold, model_name='stable_diffusion')
With the model deployed, I used the application builder to add a simple user interface. The demo app takes a prompt and returns an image, letting anyone try stable diffusion without writing a line of code. Try the app and let your creativity run wild!