REST example (vLLM)
- POST `/generate`: `{ "prompt": "<text>", "max_tokens": 256 }`
- Budget: set max thinking tokens in prompt or request params.
- Streaming: use `Accept: text/event-stream` for SSE.
API & SDK
Call Nemotron 3 via vLLM, SGLang, or OpenRouter with examples for streaming, tool use, and thinking-budget control.
Yes, vLLM and OpenRouter support SSE streaming responses.
Declare a max thinking-token budget in the prompt or request payload; combine with ON/OFF toggles.
Yes, with function/tool schemas; limit function count to control cost.