Hugging Face logoZephyr 7B Alpha

A seven billion parameter LLM fine tuned from Mistral 7B for chat and assistant use cases.

Deploy Zephyr 7B Alpha behind an API endpoint in seconds.

Deploy model

Example usage

OpenAI Chat Completions Streaming Example

This code example shows how to invoke the model with the OpenAI Chat Completions API. The model has three main inputs:

  1. messages: This is a list of JSON objects. Each of those JSON objects should have a key called role which can have the value of either user or assistant. The JSON object should also have content which is the text passed to the large language model.

  2. stream: Setting this to True allows you to stream the tokens as they get generated.

  3. max_tokens: Allows you to control the length of the output sequence.

Because this code example streams the tokens as they get generated, it does not produce a JSON output.

Input
1from openai import OpenAI
2import os
3
4# Replace the empty string with your model id below
5model_id = ""
6
7client = OpenAI(
8   api_key=os.environ["BASETEN_API_KEY"],
9   base_url=f"https://bridge.baseten.co/{model_id}/v1"
10)
11
12# Call model endpoint
13response = client.chat.completions.create(
14 model="zephyr-7b-alpha",
15 messages=[
16   {"role": "user", "content": "What is a zephyr?"}
17 ],
18 temperature=0.9,
19 max_tokens=128,
20 stream=True
21)
22
23# Print the generated tokens as they get streamed
24for chunk in response:
25    print(chunk.choices[0].delta.content)
JSON output
1[
2    "A",
3    "zephyr",
4    "is",
5    "a",
6    "gentle",
7    "...."
8]

Streaming Example Using REST API

Using the OpenAI Chat Completions API is optional. You can also make a REST API call using the requests library. To invoke the model using this method you need to same three inputs messages , stream, and max_new_tokens.

Because this code example streams the tokens as they get generated, it does not produce a JSON output.

Input
1import requests
2import os
3
4# Replace the empty string with your model id below
5model_id = ""
6baseten_api_key = os.environ["BASETEN_API_KEY"]
7
8messages = [
9  {"role": "user", "content": "What is a zephyr?"}
10]
11
12data = {
13    "messages": messages,
14    "stream": True,
15    "max_new_tokens": 128,
16    "temperature": 0.9
17}
18
19# Call model endpoint
20res = requests.post(
21    f"https://model-{model_id}.api.baseten.co/production/predict",
22    headers={"Authorization": f"Api-Key {baseten_api_key}"},
23    json=data,
24    stream=True
25)
26
27# Print the generated tokens as they get streamed
28for content in res.iter_content():
29    print(content.decode("utf-8"), end="", flush=True)
JSON output
1[
2    "A",
3    "zephyr",
4    "is",
5    "a",
6    "gentle",
7    "...."
8]

Non-Streaming Example Using REST API

If you don't want to stream the tokens simply set the stream parameter to False.

The output is the entire text generated by the model.

Input
1import requests
2import os
3
4# Replace the empty string with your model id below
5model_id = ""
6baseten_api_key = os.environ["BASETEN_API_KEY"]
7
8messages = [
9  {"role": "user", "content": "What is a zephyr?"}
10]
11
12data = {
13    "messages": messages,
14    "stream": False,
15    "max_new_tokens": 128,
16    "temperature": 0.9
17}
18
19# Call model endpoint
20res = requests.post(
21    f"https://model-{model_id}.api.baseten.co/production/predict",
22    headers={"Authorization": f"Api-Key {baseten_api_key}"},
23    json=data
24)
25
26# Print the output of the model
27print(res.json())
JSON output
1{
2    "output": "<|assistant|>\n A zephyr is a gentle, light breeze, especially one blowing from the west in ancient Greek mythology. The word is derived from the Greek word ζέφυρος (zéphyros) which is named after the Greek god of the west wind, Zephyrus. In modern usage, zephyr refers to a light and soft wind, often used to describe winds with speeds under 10 miles per hour (16 kilometers per hour)."
3}

Deploy any model in just a few commands

Avoid getting tangled in complex deployment processes. Deploy best-in-class open-source models and take advantage of optimized serving for your own models.

$

truss init -- example stable-diffusion-2-1-base ./my-sd-truss

$

cd ./my-sd-truss

$

export BASETEN_API_KEY=MdNmOCXc.YBtEZD0WFOYKso2A6NEQkRqTe

$

truss push

INFO

Serializing Stable Diffusion 2.1 truss.

INFO

Making contact with Baseten 👋 👽

INFO

🚀 Uploading model to Baseten 🚀

Upload progress: 0% | | 0.00G/2.39G