large language
NVIDIA Nemotron Nano 12B V2 VL
Open source vision language model by NVIDIA for document processing
Model details
View repositoryExample usage
The NVIDIA Nemotron Nano 12B V2 VL model is a 12 billion-parameter vision-language model that can ingest images (or videos) alongside text and generate detailed, contextual text responses — for example summarizing visuals, performing OCR, or answering questions about images.
✕
Nemotron Nano 2 VL delivers improved accuracy across visual benchmarks for multi-image understanding, document intelligence, and video captioning.Input
1from openai import OpenAI
2import os
3
4model_id
5
6# Configure your deployment
7client = OpenAI(
8 api_key=os.environ.get("BASETEN_API_KEY"),
9 base_url=f"https://model-{model_id}.api.baseten.co/environment/production/sync/v1"
10)
11
12# Test the model with streaming
13stream = client.chat.completions.create(
14 model="", # Use the served model name from config
15 messages=[
16 {
17 "role": "user",
18 "content": [
19 {
20 "type": "image_url",
21 "image_url": {
22 "url": "https://upload.wikimedia.org/wikipedia/commons/f/fa/Grayscale_8bits_palette_sample_image.png"
23 }
24 },
25 {
26 "type": "text",
27 "text": "Describe this image in detail."
28 }
29 ]
30 }
31 ],
32 stream=True
33)
34
35# Stream the response
36for chunk in stream:
37 if chunk.choices[0].delta.content is not None:
38 print(chunk.choices[0].delta.content, end='', flush=True)
39print()
40
41"""
42This is a black and white image of a bird, which appears to be a parrot,
43perched on a curved metal stand. The bird is facing the left side of the
44image. It has a curved beak and its wings are slightly folded. The bird's
45feathers are short and fluffy, and it has a large, round head. Behind the
46bird, there are what appear to be bushes or shrubs.
47"""JSON output
1{
2 "id": "143",
3 "choices": [
4 {
5 "finish_reason": "stop",
6 "index": 0,
7 "logprobs": null,
8 "message": {
9 "content": "[Model output here]",
10 "role": "assistant",
11 "audio": null,
12 "function_call": null,
13 "tool_calls": null
14 }
15 }
16 ],
17 "created": 1741224586,
18 "model": "",
19 "object": "chat.completion",
20 "service_tier": null,
21 "system_fingerprint": null,
22 "usage": {
23 "completion_tokens": 145,
24 "prompt_tokens": 38,
25 "total_tokens": 183,
26 "completion_tokens_details": null,
27 "prompt_tokens_details": null
28 }
29}