NVIDIA Nemotron 3 Nano Omni: Build Multimodal Agents on Baseten

What is NVIDIA Nemotron 3 Nano Omni?

NVIDIA Nemotron 3 Nano Omni is an open, efficient multimodal foundation model built to power sub-agents that understand and reason across video, audio, images, documents and text in enterprise agent systems.

Most agent systems today rely on separate models for speech, vision and language. When it comes to agentic workflows, this creates problems. Separate models mean repeated inference passes, increasing latency. Orchestration and error handling also become more complex and context fragments across modalities which reduces accuracy.

NVIDIA Nemotron 3 Nano Omni takes a different approach. It is a single multimodal reasoning model that enables agents to reason and perceive across modalities within a unified loop with complete deployment control and efficient performance.

Nemotron 3 Nano Omni combines audio and vision encoders into a unified multimodal architecture, which eliminates the need for separate perception models. This enables agents to complete tasks faster at scale, and simplifies agent development.

There are three architectural choices in particular that make Omni efficient:

A latent MoE design used that improves memory and compute efficiency
3D convolutional layers let the model extract spatial and temporal features together, so it knows how visuals change over time
Efficient video sampling selectively processes the most dynamic parts of long videos, instead of scanning entire frames

Nemotron 3 Nano Omni extends the efficiency and accuracy of Nemotron 3 Nano across different modalities.

When to use Nemotron 3 Nano Omni

Nemotron 3 Nano Omni’s open, lightweight 30B-A3B architecture supports deployments in local environments, such as NVIDIA DGX systems, as well as datacenters and cloud environments. It’s designed for computer use, complex document intelligence and audio and video reasoning.

Context matters in customer service, research and monitoring workflows, and Nemotron 3 Nano Omni preserves unified multimodal context across audio, video, and documents within a single reasoning loop.

Scaling enterprise AI with Nemotron 3 Nano Omni on Baseten

At Baseten, as an AI infrastructure platform purpose-built for ultra-fast inference, we provide day-zero support for Nemotron 3 Nano Omni.

Our platform accelerates enterprise AI initiatives through:

High performance inference: We run generative AI models with exceptionally low latency, including our top tier GPT-OSS API powered by NVIDIA Dynamo and the NVIDIA Blackwell architecture.
Multi-cloud Capacity Management (MCM): Our customers tap into autoscaling GPU resources spanning all major hyperscalers and neo-cloud providers, unified into a single compute layer. They benefit from active-active reliability, isolated workload planes and intelligent multi-cloud capacity management.
Expert engineering support: Collaboration with the Baseten Forward Deployed Engineers (FDE) provides hands-on assistance from specialists in large scale inference.
Robust enterprise security: Baseten is SOC 2 Type II, SOC 3 and HIPAA compliant, offers self-hosting options as well as audit logging, SSO and other features essential for enterprise environments.

The Baseten Inference stack is critical to achieve these results and under the hood it uses NVFP4, components of TensorRT-LLM, Dynamo and the Baseten Speculation Engine. All running on NVIDIA Blackwell GPUs.

Building with NVIDIA Nemotron 3 Nano Omni in production

If you’re building agents that need to see, hear, and reason across workflows—such as customer service, computer use or document intelligence - Nemotron 3 Nano Omni provides you with a production-ready, open foundation to do all of this with a single model.

The model processes multimodal inputs such as audio, video, images, and documents (e.g., recordings or files) and performs unified reasoning across them in a single pass.

You can deploy Nemotron 3 Nano Omni on Baseten for scalable multimodal inference, or get in touch with our engineers to learn more about the performance, scale, security, and flexibility we offer enterprises, including our self-hosting capabilities.