reasoning_content field, distinct from the final response.
Supported models
| Model | Slug | Reasoning |
|---|---|---|
| DeepSeek V4 Pro | deepseek-ai/DeepSeek-V4-Pro | Enabled by default |
| OpenAI GPT OSS 120B | openai/gpt-oss-120b | Enabled by default |
| Kimi K2.5 | moonshotai/Kimi-K2.5 | Opt-in through chat_template_args |
| Kimi K2.6 | moonshotai/Kimi-K2.6 | Opt-in through chat_template_args |
| Kimi K2.7 Code | moonshotai/Kimi-K2.7-Code | Opt-in through chat_template_args |
| GLM 4.7 | zai-org/GLM-4.7 | Opt-in through chat_template_args |
| GLM 5 | zai-org/GLM-5 | Opt-in through chat_template_args |
| GLM 5.1 | zai-org/GLM-5.1 | Opt-in through chat_template_args |
| GLM 5.2 | zai-org/GLM-5.2 | Opt-in through chat_template_args |
| Nemotron Super | nvidia/Nemotron-120B-A12B | Opt-in through chat_template_args |
| Nemotron Ultra | nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B | Opt-in through chat_template_args |
reasoning_effort.
Models not listed here don’t support reasoning.
Enable thinking
Enable thinking for Kimi K2.5, Kimi K2.6, Kimi K2.7 Code, GLM 4.7, GLM 5, GLM 5.1, GLM 5.2, Nemotron Super, and Nemotron Ultra by passingchat_template_args.
- Python
- JavaScript
- cURL
Pass
chat_template_args through extra_body since it extends the standard OpenAI API:enable_thinking.py
Control reasoning depth
Thereasoning_effort parameter controls how thoroughly the model reasons through a problem. DeepSeek V4 Pro and GPT OSS 120B support this parameter.
| Value | Behavior |
|---|---|
low | Faster responses, less thorough reasoning |
medium | Balanced (default) |
high | Slower responses, more thorough reasoning |
xhigh | Maximum reasoning depth, highest token cost (DeepSeek V4 Pro only) |
- DeepSeek V4 Pro
- GPT OSS 120B
- Python
- JavaScript
- cURL
Pass
reasoning_effort through extra_body since it extends the standard OpenAI API:reasoning_effort.py
reasoning_effort to low.
Parse the response
The model’s thinking process appears inreasoning_content, separate from the final answer in content. Both fields are returned on the message object.
- Python
- JavaScript
- cURL
Read
reasoning_content and content directly off the message object:parse_reasoning.py
Response
completion_tokens and count toward your total usage and billing.
Next steps
Model APIs overview
Supported models, pricing, and the feature support matrix
Structured outputs
Constrain reasoning models to a JSON schema