Key metrics
- ~3.3× Qwen3-30B throughput on H200 (8K→16K)
- ~3.6B active params (~11% active share)
- 1M context with stable long-context performance
Benchmarks
On H200, Nemotron 3 Nano 30B delivers ~3.3× Qwen3-30B throughput (8K→16K) with ~3.6B active params and a 1M context window.
Sparse MoE (6/128) and low active params reduce per-token compute.
With proper max_tokens and batching, throughput remains stable for long context.
Reuse vLLM/SGLang benchmark scripts with updated max length settings.