changelog / post
Nemotron Ultra on Baseten
You can deploy Nemotron Ultra in one click with our Model APIs. Dedicated deployments are available for larger workloads. NVIDIA Nemotron Ultra, a 550B-parameter mixture-of-experts model with 55B active parameters, is now available through Model APIs with a 202K-token context window.
Call it with the OpenAI or Anthropic SDK, with tool calling, structured outputs, and opt-in reasoning.
1curl -X POST https://inference.baseten.co/v1/chat/completions \
2 -H "Content-Type: application/json" \
3 -H "Authorization: Api-Key $BASETEN_API_KEY" \
4 -d '{
5 "model": "nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B",
6 "messages": [
7 {
8 "role": "user",
9 "content": "Implement Hello World in Python"
10 }
11 ],
12 "stream": true,
13 "stream_options": {
14 "include_usage": true,
15 "continuous_usage_stats": true
16 },
17 "top_p": 1,
18 "max_tokens": 1000,
19 "temperature": 1,
20 "presence_penalty": 0,
21 "frequency_penalty": 0
22 }' \
23 --no-bufferFor more information, see our docs or get started by talking to us.