large language

Qwen LogoQwen3 Omni Thinker

An "omni" model that can process both image and audio input

Model details

View repository

Example usage

Qwen 3 Omni is compatible with the OpenAI SDK. It takes multiple modalities of input: text, image, and audio. The "Thinker" variant of the model, implemented here, returns text.

1{
2  "model": "qwen3-omni",
3  "messages": [
4    {"role": "system", "content": "You are a helpful assistant."},
5    {
6      "role": "user",
7      "content": [
8        {"type": "text", "text": "Describe what you see and hear."},
9        {
10          "type": "image_url",
11          "image_url": {"url": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-Omni/demo/cars.jpg"}
12        },
13        {
14          "type": "audio_url",
15          "audio_url": {"url": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-Omni/demo/cough.wav"}
16        }
17      ]
18    }
19  ],
20  "max_tokens": 2048,
21  "temperature": 0.7,
22  "stream": false
23}
Input
1from openai import OpenAI
2import os
3
4client = OpenAI(
5    api_key=os.environ["BASETEN_API_KEY"],
6    base_url="https://model-xxxxxx.api.baseten.co/environments/production/sync/v1"
7)
8
9resp = client.chat.completions.create(
10    model="qwen3-omni",
11    messages=[
12        {"role": "system", "content": "You are a helpful assistant."},
13        {"role": "user", "content": [
14            {"type": "text", "text": "Describe this image and audio content."},
15            {"type": "image_url", "image_url": {"url": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-Omni/demo/cars.jpg"}},
16            {"type": "audio_url", "audio_url": {"url": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-Omni/demo/cough.wav"}}
17        ]}
18    ],
19    max_tokens=2048,
20    temperature=0.7,
21    stream=False,
22)
23print(resp.choices[0].message.content)
JSON output
1{
2    "id": "chatcmpl-...",
3    "object": "chat.completion",
4    "created": 1710000000,
5    "model": "qwen3-omni",
6    "choices": [
7        {
8            "index": 0,
9            "finish_reason": "stop",
10            "message": {
11                "role": "assistant",
12                "content": "I see several parked cars in front of a building and hear a short cough."
13            }
14        }
15    ],
16    "usage": {
17        "prompt_tokens": 512,
18        "completion_tokens": 24,
19        "total_tokens": 536
20    }
21}

large language models

See all
Qwen Logo
LLM

Qwen3 VL 235B

3 - Vision Language
Kimi
Model API
LLM

Kimi K2 0905

0905 - K2
DeepSeek Logo
Model API
LLM

DeepSeek V3.1

V3.1 - B200

Qwen models

See all
Qwen Logo
LLM

Qwen3 VL 235B

3 - Vision Language
Qwen Logo
Model API
LLM

Qwen3 235B 2507

2507
Qwen Logo
Model API
LLM

Qwen3 Coder 480B

3 - Coder

🔥 Trending models