Deploy StableLM with Truss

Stability AI launches StableLM

Stability AI recently announced the ongoing development of the StableLM series of language models and simultaneously released a number of checkpoints for this model. Trained on over 1.5 million tokens of content with a relatively small set of parameters (three billion and seven billion parameter models are included in the release), these models are ideal for conversational and coding-related tasks.

However, utilizing these models for inference can be challenging given the hardware requirements. But with Baseten and Truss, this can be dead simple. Baseten provides all the infrastructure you need to deploy and serve ML models performantly, scalably, and cost-efficiently, while Truss provides a seamless bridge from model development to model delivery.

You can see the full code repository for this project here.

​Deploying StableLM

There are four models that were released:

You can modify the load method in model.py to select the version you'd like to deploy.

model_name = "stabilityai/stablelm-tuned-alpha-7b"

# Options:
# - "stabilityai/stablelm-base-alpha-7b"
# - "stabilityai/stablelm-tuned-alpha-7b"
# - "stabilityai/stablelm-base-alpha-3b"
# - "stabilityai/stablelm-tuned-alpha-3b"

Configuring GPU resources for StableLM

We found this model runs reasonably fast on A10Gs; you can configure the hardware you'd like in the config.yaml.

...
resources:  
  cpu: "3"  
  memory: 14Gi  
  use_gpu: true  
  accelerator: A10G
...

StableLM parameters

The usual GPT-style parameters will pass right through to the inference point:

  • max_new_tokens (default: 64)

  • temperature (default: 0.5)

  • top_p (default: 0.9)

  • top_k (default: 0)

  • num_beams (default: 4)

Adding system prompts for use in Chatbots

If the tuned versions are needed for use in Chatbots; prepend the input message with the system prompt as described in the StableLM README:

system_prompt = """<|SYSTEM|># StableLM Tuned (Alpha version)
- StableLM is a helpful and harmless open-source AI language model developed by StabilityAI.
- StableLM is excited to be able to help the user, but will refuse to do anything that could be considered harmful to the user.
- StableLM is more than just an information source, StableLM is also able to write poetry, short stories, and make jokes.
- StableLM will refuse to participate in anything that could harm a human.
"""

prompt = f"{system_prompt}<|USER|>What's your mood today?<|ASSISTANT|>"

Deploy StableLM to Baseten with Truss

Deploying the truss is easy; simply run:

pip install --upgrade truss
git clone https://github.com/basetenlabs/truss-examples
cd stablelm
truss push

Once deployed to Baseten, StableLM is available behind a REST API for immediate use in production. From there you can take advantage of our auto-scaling resources to ensure efficient, low-latency performance even in high-traffic scenarios. Get started today with $30 of free credits.