N3

Nemotron

Next-gen open intelligent models

Model specs

Nemotron 3 Nano 30B Specs and 1M Context

Nemotron 3 Nano 30B: Hybrid Mamba-Transformer sparse MoE (6/128 experts), ~3.6B active params, 1M context, Reasoning ON/OFF with thinking budgets for agents and tools.

Get weights
nemotron 3 nano specsnemotron 3 1m contextmamba transformer moe6/128 expertslong context model

Key specs

  • Architecture: Mamba‑2 + Transformer + sparse MoE (6/128 experts)
  • Params: 31.6B total, ~3.6B active per token
  • Context: 1,000,000 tokens (512k CPT + 4k mixed training)
  • Attention: GQA + thinking budgets, Reasoning ON/OFF
  • Weights: BF16, compatible with vLLM / SGLang serving

Best for

  • Long-chain reasoning and tool use with predictable cost
  • High-concurrency multi-agent systems
  • Long-doc RAG/legal/research with 1M window

Openness & license

  • Open weights, data, and training recipes for reproduction
  • License: NVIDIA Open Model License (OML) for commercial use

FAQ

What is the context length?

Up to 1,000,000 tokens via 512k CPT plus 4k mixed training.

Does it support Reasoning ON/OFF?

Yes, with configurable thinking-token budgets per request.

Which serving stack is recommended?

vLLM or SGLang on H100/H200 for best throughput.