
NVIDIA Nemotron 3 Nano is a small language model with a hybrid mixture-of-experts (MoE) architecture, offering high compute efficiency and accuracy for developers to build specialized agentic AI systems. The model is fully open across weights, datasets and recipes so developers can easily customize, optimize and deploy the model. Enterprises across multiple industries can leverage Nemotron 3 Nano on Baseten for performant, secure, and highly scalable inference.
Today, Baseten is launching day-zero support for NVIDIA Nemotron 3 Nano, a highly accurate and efficient open model available today.
Nemotron 3 Nano uses a MoE with hybrid Transformer-Mamba architecture and delivers up to 4x faster token generation versus the previous generation of Nemotron. This enables the model to simultaneously think faster and provide higher accuracy. The MoE architecture reduces compute needs and meets stringent latency requirements of real world applications. Finally, the thinking budget prevents the model from overthinking and optimizes for lower, predictable inference cost.
Nemotron 3 Nano outperforms leading models in its size class in benchmarks comparing openness index vs. intelligence index
Nemotron 3 Nano outperforms leading models in its class in benchmarks comparing intelligence vs, output speedNemotron 3 Nano is available to deploy today on Baseten and leverages the Baseten Inference Stack for high-throughput and low-latency out-of-the-box performance.
Using Nemotron 3 Nano in the enterprise
While every industry can benefit from Nemontron 3 Nano’s high accuracy and efficiency, its small size and consistent performance makes it a great fit for use cases in the financial services and retail industries.
In financial services, accelerating loan processing by extracting data, analyzing income patterns, detecting fraudulent operations, all reducing cycle times and risk.
In retail, optimizing inventory management and enhancing in-store service with real-time, personalized product recommendations and support.
And across all industries, supporting software development by assisting with tasks like code summarization.
As enterprises in the financial services and retail spaces build out differentiated agentic capabilities, they will find Nemotron 3 Nano highly accurate. This model was trained on NVIDIA-curated high quality synthetic data generated from expert reasoning models in various categories, and aligned to reason like humans with reinforcement learning.
When to use Nemotron 3 Nano
Nemotron 3 Nano is a small language model specifically trained to support specialized agentic AI systems. It is especially strong at extracting data and analyzing it to recognize break in patterns and identify fraud in the financial services industry for example.
Despite its size, the model achieves outstanding accuracy due to its high quality datasets and generating reinforcement learning training environments , making it a great choice to accomplish very targeted tasks for financial services and retail customers.
Scaling enterprise AI with Nemotron 3 Nano on Baseten
At Baseten, as an AI infrastructure platform purpose-built for ultra-fast inference, we provide day-zero support for Nemotron 3 Nano.
Our platform accelerates enterprise AI initiatives through:
High performance inference: We run generative AI models with exceptionally low latency, including our top tier GPT-OSS API powered by NVIDIA Dynamo and the NVIDIA Blackwell architecture.
Multi-cloud Capacity Management (MCM): Our customers tap into autoscaling GPU resources spanning all major hyperscaler and Neocloud providers, unified into a single compute layer. They benefit from active-active reliability, isolated workload planes and intelligent multi-cloud capacity management.
Expert engineering support: Collaboration with the Baseten Forward Deployed Engineers (FDE) provides hands-on assistance from specialists in large scale inference.
Robust enterprise security: Baseten is SOC 2 Type II and HIPAA compliant, offers self-hosting options as well as audit logging, SSO and other features essential for enterprise environments.
Baseten leverages a large number of NVIDIA technologies including NVIDIA Dynamo, NVIDIA TensorRT-LLM, NVIDIA TensorRT, custom CUDA kernels, and more in the Baseten Inference Stack.
Building with NVIDIA Nemotron 3 Nano in production
If you are building specialized agentic AI applications for financial services or retail use cases in particular, you should check out Nemotron 3 Nano.
You can deploy Nemotron 3 Nano on Baseten today for scalable inference on NVIDIA’s latest small language model, or get in touch with our engineers to learn more about the performance, scale, security, and flexibility we offer enterprises, including our self-hosting capabilities.
Subscribe to our newsletter
Stay up to date on model performance, GPUs, and more.


