Qwen3 Next 80B A3B Thinking

Qwen3-Next-80B-A3B Thinking is the first installment in the Qwen3-Next series featuring Hybrid Attention, High-Sparsity MoE, Stability Optimizations, and MTP
Model details
View repositoryExample usage
This example code shows how to call Qwen 3 Next 80B A3B Thinking using the openAI client. You can also make requests to the /predict endpoint with a message or /v1/completions endpoint with a prompt.
Qwen3-Next-80B-A3B-Thinking supports only thinking mode. To enforce model thinking, the default chat template automatically includes <think>
. Therefore, it is normal for the model's output to contain only </think>
without an explicit opening <think>
tag.
Qwen3-Next-80B-A3B-Thinking may generate thinking content longer than its predecessor. We strongly recommend its use in highly complex reasoning tasks.
1# You can use this model with any of the OpenAI clients in any language!
2# Simply change the API Key to get started
3
4from openai import OpenAI
5model_id = "YOUR_MODEL_ID_HERE"
6client = OpenAI(
7 api_key="YOUR_API_KEY",
8 base_url=f"https://model-{model_id}.api.baseten.co/environments/production/sync/v1"
9)
10
11response = client.chat.completions.create(
12 model="Qwen/Qwen3-Next-80B-A3B-Thinking",
13 messages=[{"role":"system","content":"You are a helpful assistant."},{"role":"user","content":"Write FizzBuzz in Python"}],
14)
15
16print(response.choices[0].message.content)
1{
2 "id": "143",
3 "choices": [
4 {
5 "finish_reason": "stop",
6 "index": 0,
7 "logprobs": null,
8 "message": {
9 "content": "[Model output here]",
10 "role": "assistant",
11 "audio": null,
12 "function_call": null,
13 "tool_calls": null
14 }
15 }
16 ],
17 "created": 1741224586,
18 "model": "",
19 "object": "chat.completion",
20 "service_tier": null,
21 "system_fingerprint": null,
22 "usage": {
23 "completion_tokens": 145,
24 "prompt_tokens": 38,
25 "total_tokens": 183,
26 "completion_tokens_details": null,
27 "prompt_tokens_details": null
28 }
29}