Nemotron Ultra on Baseten

You can deploy Nemotron Ultra in one click with our Model APIs. Dedicated deployments are available for larger workloads. NVIDIA Nemotron Ultra, a 550B-parameter mixture-of-experts model with 55B active parameters, is now available through Model APIs with a 202K-token context window.

Call it with the OpenAI or Anthropic SDK, with tool calling, structured outputs, and opt-in reasoning.

1curl -X POST https://inference.baseten.co/v1/chat/completions \
2  -H "Content-Type: application/json" \
3  -H "Authorization: Api-Key $BASETEN_API_KEY" \
4  -d '{
5    "model": "nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B",
6    "messages": [
7      {
8        "role": "user",
9        "content": "Implement Hello World in Python"
10      }
11    ],
12    "stream": true,
13    "stream_options": {
14      "include_usage": true,
15      "continuous_usage_stats": true
16    },
17    "top_p": 1,
18    "max_tokens": 1000,
19    "temperature": 1,
20    "presence_penalty": 0,
21    "frequency_penalty": 0
22  }' \
23  --no-buffer

For more information, see our docs or get started by talking to us.

Nemotron Ultra on Baseten

Explore Baseten today