Meta logoLlama 2 7B Chat

A seven billion parameter foundation model tuned for chat use cases.

Deploy Llama 2 7B Chat behind an API endpoint in seconds.

Deploy model

Example usage

Streaming Token Example

This code example shows how to stream the output tokens as they get generated using Python. The model has three main inputs:

  1. prompt: The input text sent to the model.

  2. stream: Setting this to True allows you to stream the tokens as they get generated.

  3. max_length: Allows you to control the length of the output sequence.

Because this code example streams the tokens as they get generated, it does not produce a JSON output.

Input
1import requests
2import os
3
4# Replace the empty string with your model id below
5model_id = ""
6baseten_api_key = os.environ["BASETEN_API_KEY"]
7
8data = {
9    "prompt": "What is the difference between a llama and an alpaca?",
10    "stream": True,
11    "max_length": 512
12}
13
14# Call model endpoint
15res = requests.post(
16    f"https://model-{model_id}.api.baseten.co/production/predict",
17    headers={"Authorization": f"Api-Key {baseten_api_key}"},
18    json=data,
19    stream=True
20)
21
22# Print the generated tokens as they get streamed
23for content in res.iter_content():
24    print(content.decode("utf-8"), end="", flush=True)
JSON output
1[
2    "llamas",
3    "and",
4    "alpacas",
5    "are",
6    "..."
7]

Non-Streaming Example

If you don't want to stream the tokens simply set the stream parameter to False.

The output is a list containing the generated text.

Input
1import requests
2import os
3
4# Replace the empty string with your model id below
5model_id = ""
6baseten_api_key = os.environ["BASETEN_API_KEY"]
7
8data = {
9    "prompt": "What is the difference between a llama and an alpaca?",
10    "stream": False,
11    "max_length": 512
12}
13
14# Call model endpoint
15res = requests.post(
16    f"https://model-{model_id}.api.baseten.co/production/predict",
17    headers={"Authorization": f"Api-Key {baseten_api_key}"},
18    json=data
19)
20
21# Print the output of the model
22print(res.json())
JSON output
1[
2    "Great question! Llamas and alpacas are both members of the camelid family, but they are different species with some distinct characteristics. Here are some key differences:\n\n1. Size: Llamas are generally larger than alpacas. Adult llamas can weigh between 280-450 pounds (127-204 kg), while adult alpacas typically weigh between 100-200 pounds (45-91 kg).\n2. Coat: Both llamas and alpacas have soft, fleecy coats, but llamas have a longer coat that can be up to 6 inches (15 cm) long, while alpacas have a shorter coat that is usually around 3 inches (7.6 cm) long.\n3. Ears: Llamas have banana-shaped ears, while alpacas have smaller, more rounded ears.\n4. Tail: Llamas have a long, bushy tail, while alpacas have a shorter, more slender tail.\n5. Habitat: Llamas originated in South America, specifically in the Andean region, while alpacas are native to the Andes mountains in Peru.\n6. Temperament: Llamas are known for their independent nature and can be more challenging to train than alpacas, which are generally easier to handle and train.\n7. Purpose: While both llamas and alpacas are raised for their fiber, llamas are often used as pack animals due to their strength and endurance, while alpacas"
3]

Deploy any model in just a few commands

Avoid getting tangled in complex deployment processes. Deploy best-in-class open-source models and take advantage of optimized serving for your own models.

$

truss init -- example stable-diffusion-2-1-base ./my-sd-truss

$

cd ./my-sd-truss

$

export BASETEN_API_KEY=MdNmOCXc.YBtEZD0WFOYKso2A6NEQkRqTe

$

truss push

INFO

Serializing Stable Diffusion 2.1 truss.

INFO

Making contact with Baseten πŸ‘‹ πŸ‘½

INFO

πŸš€ Uploading model to Baseten πŸš€

Upload progress: 0% | | 0.00G/2.39G