N3

Nemotron

Next-gen open intelligent models

Reasoning control

Reasoning ON/OFF and Thinking Budgets in Nemotron 3

Control chain-of-thought depth with Reasoning ON/OFF and thinking-token budgets to balance accuracy, privacy, and cost.

nemotron reasoningthinking budgetreasoning on offchain of thought controlnemotron budgets

Modes

  • ON: keep chain-of-thought for multi-step reasoning, tools, math.
  • OFF: concise replies, better for chat and fan-out agents.

Budgets

  • Declare max thinking tokens in prompt/request; budget caps depth.
  • Use with ON to balance quality and cost.

Best practices

  • Default OFF for frequent calls; toggle ON per task with budgets.
  • Safety: OFF reduces CoT leakage in sensitive contexts.
  • Monitor thinking tokens and refine prompts/limits over time.

FAQ

How to set budgets via API?

Include a max thinking-token limit in the prompt or request payload.

Does OFF reduce quality?

Minimal impact on simple Q&A; for complex reasoning use ON with budgets.

Can I switch per request?

Yes. Toggle ON/OFF and budgets per call based on scenario.