large language
Qwen3 Omni Thinker

An "omni" model that can process both image and audio input
Model details
View repositoryExample usage
Qwen 3 Omni is compatible with the OpenAI SDK. It takes multiple modalities of input: text, image, and audio. The "Thinker" variant of the model, implemented here, returns text.
1{
2 "model": "qwen3-omni",
3 "messages": [
4 {"role": "system", "content": "You are a helpful assistant."},
5 {
6 "role": "user",
7 "content": [
8 {"type": "text", "text": "Describe what you see and hear."},
9 {
10 "type": "image_url",
11 "image_url": {"url": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-Omni/demo/cars.jpg"}
12 },
13 {
14 "type": "audio_url",
15 "audio_url": {"url": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-Omni/demo/cough.wav"}
16 }
17 ]
18 }
19 ],
20 "max_tokens": 2048,
21 "temperature": 0.7,
22 "stream": false
23}
Input
1from openai import OpenAI
2import os
3
4client = OpenAI(
5 api_key=os.environ["BASETEN_API_KEY"],
6 base_url="https://model-xxxxxx.api.baseten.co/environments/production/sync/v1"
7)
8
9resp = client.chat.completions.create(
10 model="qwen3-omni",
11 messages=[
12 {"role": "system", "content": "You are a helpful assistant."},
13 {"role": "user", "content": [
14 {"type": "text", "text": "Describe this image and audio content."},
15 {"type": "image_url", "image_url": {"url": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-Omni/demo/cars.jpg"}},
16 {"type": "audio_url", "audio_url": {"url": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-Omni/demo/cough.wav"}}
17 ]}
18 ],
19 max_tokens=2048,
20 temperature=0.7,
21 stream=False,
22)
23print(resp.choices[0].message.content)
JSON output
1{
2 "id": "chatcmpl-...",
3 "object": "chat.completion",
4 "created": 1710000000,
5 "model": "qwen3-omni",
6 "choices": [
7 {
8 "index": 0,
9 "finish_reason": "stop",
10 "message": {
11 "role": "assistant",
12 "content": "I see several parked cars in front of a building and hear a short cough."
13 }
14 }
15 ],
16 "usage": {
17 "prompt_tokens": 512,
18 "completion_tokens": 24,
19 "total_tokens": 536
20 }
21}