API & SDK

Nemotron 3 API: Calls, Examples, and Budgets

Call Nemotron 3 via vLLM, SGLang, or OpenRouter with examples for streaming, tool use, and thinking-budget control.

nemotron 3 apinemotron openrouternemotron 3 sglang apinemotron thinking budgetnemotron tool calling

REST example (vLLM)

Python: `from sglang import client; client.chat(model="nemotron3", messages=[...])`
Tools: declare functions in schema; Reasoning ON keeps chain-of-thought.

Is streaming supported?

Yes, vLLM and OpenRouter support SSE streaming responses.

How to cap thinking tokens?

Declare a max thinking-token budget in the prompt or request payload; combine with ON/OFF toggles.

Does tool use work out of the box?

Yes, with function/tool schemas; limit function count to control cost.