large language

NVIDIA logoLlama 3.3 Nemotron 49B Super - NVIDIA NIM

A high-efficiency model with leading accuracy for reasoning, tool calling, chat, and instruction following.

Model details

Example usage

Input
1import requests
2
3# Replace the empty string with your model id below
4model_id = ""
5baseten_api_key = os.environ["BASETEN_API_KEY"]
6
7messages = [
8    {"role": "user", "content": "Write a limerick about the wonders of GPU computing.?"},
9]
10data = {
11    "messages": messages,
12    "stream": True,
13    "max_new_tokens": 512
14}
15
16# Call model endpoint
17res = requests.post(
18    f"https://model-{model_id}.api.baseten.co/production/predict",
19    headers={"Authorization": f"Api-Key {baseten_api_key}"},
20    json=data,
21    stream=True
22)
23
24# Print the generated tokens as they get streamed
25for content in res.iter_content():
26    print(content.decode("utf-8"), end="", flush=True)
JSON output
1null

large language models

See all
DeepSeek Logo
Model API
LLM

DeepSeek-V3

V3 - SGLang - B200
Qwen Logo
LLM

Qwen 3 4B

V3 - TRT-LLM - H100
Qwen Logo
LLM

Qwen 3 32B

V3 - TRT-LLM - H100

NVIDIA models

See all
NVIDIA logo
LLM

Llama 3.1 Nemotron 70B

3.1 - Nemotron - A100
NVIDIA logo
LLM

Llama 3.1 Nemotron Ultra 253B

3.1 - Nemotron - TRT-LLM - H100

🔥 Trending models