transcription

Fixie LogoUltravox v0.5 8B

Ultravox is a multimodal model that can consume both speech and text as input, generating output text as usual. This uses Llama 3.1 8B Instruct as its backbone.

Model details

View repository

Example usage

Since this is a multimodal model, it accepts text and/or audio. The audio is downloaded from a public source.

The output JSON object contains a key called content which represents the usual output text.

Input
1from openai import OpenAI
2
3model_id = "jwdp26kw" # Replace with your model ID from Baseten's model dashboard
4
5client = OpenAI(
6    api_key="YOUR-API-KEY",
7    base_url=f"https://model-{model_id}.api.baseten.co/environments/production/sync/v1"
8)
9
10response = client.chat.completions.create(
11    model="",
12    messages=[
13        {
14            "role": "user",
15            "content": [
16                {
17                    "type": "text",
18                    "text": "What is Lydia like?"
19                },
20                {
21                    "type": "audio_url",
22                    "audio_url": {"url": "https://baseten-public.s3.us-west-2.amazonaws.com/fred-audio-tests/real.mp3"}
23                }
24            ]
25        }
26    ]
27)
28
29print(response)
JSON output
1{
2    "id": "143",
3    "choices": [
4        {
5            "finish_reason": "stop",
6            "index": 0,
7            "logprobs": null,
8            "message": {
9                "content": "[Model output here]",
10                "role": "assistant",
11                "audio": null,
12                "function_call": null,
13                "tool_calls": null
14            }
15        }
16    ],
17    "created": 1741224586,
18    "model": "",
19    "object": "chat.completion",
20    "service_tier": null,
21    "system_fingerprint": null,
22    "usage": {
23        "completion_tokens": 145,
24        "prompt_tokens": 38,
25        "total_tokens": 183,
26        "completion_tokens_details": null,
27        "prompt_tokens_details": null
28    }
29}

transcription models

See all
Fixie Logo
Transcription

Ultravox v0.6 70B

v0.6 - H100
OpenAI logo
Transcription

Whisper Streaming Large v3

H100 MIG 40GB
OpenAI logo
Transcription

Whisper (best performance)

V3 - H100 MIG 40GB

Fixie AI models

See all
Fixie Logo
Transcription

Ultravox v0.6 70B

v0.6 - H100
Fixie Logo
Transcription

Ultravox v0.5 8B

v0.5 - H100

🔥 Trending models