Model specs

Nemotron 3 Nano 30B Specs and 1M Context

Nemotron 3 Nano 30B: Hybrid Mamba-Transformer sparse MoE (6/128 experts), ~3.6B active params, 1M context, Reasoning ON/OFF with thinking budgets for agents and tools.

Get weights

nemotron 3 nano specsnemotron 3 1m contextmamba transformer moe6/128 expertslong context model

Key specs

Architecture: Mamba‑2 + Transformer + sparse MoE (6/128 experts)
Params: 31.6B total, ~3.6B active per token
Context: 1,000,000 tokens (512k CPT + 4k mixed training)
Attention: GQA + thinking budgets, Reasoning ON/OFF
Weights: BF16, compatible with vLLM / SGLang serving

Best for

Long-chain reasoning and tool use with predictable cost
High-concurrency multi-agent systems
Long-doc RAG/legal/research with 1M window

Openness & license

Open weights, data, and training recipes for reproduction
License: NVIDIA Open Model License (OML) for commercial use

FAQ

What is the context length?

Up to 1,000,000 tokens via 512k CPT plus 4k mixed training.

Does it support Reasoning ON/OFF?

Yes, with configurable thinking-token budgets per request.

Which serving stack is recommended?

vLLM or SGLang on H100/H200 for best throughput.