Benchmarks

Nemotron 3 Benchmarks: Throughput, Latency, and Context

On H200, Nemotron 3 Nano 30B delivers ~3.3× Qwen3-30B throughput (8K→16K) with ~3.6B active params and a 1M context window.

nemotron 3 benchmarksnemotron throughputqwen3 comparisonh200 inferencenemotron latency

Key metrics

Why is throughput higher?

Sparse MoE (6/128) and low active params reduce per-token compute.

Does 1M context hurt speed?

With proper max_tokens and batching, throughput remains stable for long context.

Are benchmark scripts available?

Reuse vLLM/SGLang benchmark scripts with updated max length settings.