transcription

Ultravox v0.5 8B

Ultravox is a multimodal model that can consume both speech and text as input, generating output text as usual. This uses Llama 3.1 8B Instruct as its backbone.

Deploy now

‌

Model details

Developed by
Fixie AI
Model family
Fixie
Use case
transcription
Version
v0.5
Size
8B
Hardware
H100
API
OpenAI SDK
License
MIT
Readme
View

View repository

Example usage

Since this is a multimodal model, it accepts text and/or audio. The audio is downloaded from a public source.

The output JSON object contains a key called content which represents the usual output text.

Input

1from openai import OpenAI
2
3model_id = "jwdp26kw" # Replace with your model ID from Baseten's model dashboard
4
5client = OpenAI(
6    api_key="YOUR-API-KEY",
7    base_url=f"https://model-{model_id}.api.baseten.co/environments/production/sync/v1"
8)
9
10response = client.chat.completions.create(
11    model="",
12    messages=[
13        {
14            "role": "user",
15            "content": [
16                {
17                    "type": "text",
18                    "text": "What is Lydia like?"
19                },
20                {
21                    "type": "audio_url",
22                    "audio_url": {"url": "https://baseten-public.s3.us-west-2.amazonaws.com/fred-audio-tests/real.mp3"}
23                }
24            ]
25        }
26    ]
27)
28
29print(response)

JSON output

1{
2    "id": "143",
3    "choices": [
4        {
5            "finish_reason": "stop",
6            "index": 0,
7            "logprobs": null,
8            "message": {
9                "content": "[Model output here]",
10                "role": "assistant",
11                "audio": null,
12                "function_call": null,
13                "tool_calls": null
14            }
15        }
16    ],
17    "created": 1741224586,
18    "model": "",
19    "object": "chat.completion",
20    "service_tier": null,
21    "system_fingerprint": null,
22    "usage": {
23        "completion_tokens": 145,
24        "prompt_tokens": 38,
25        "total_tokens": 183,
26        "completion_tokens_details": null,
27        "prompt_tokens_details": null
28    }
29}