Reasoning

Some Model APIs support extended thinking, where the model reasons through a problem before producing a final answer. The reasoning process generates additional tokens that appear in a separate reasoning_content field, distinct from the final response.

Supported models

Model	Slug	Reasoning
DeepSeek V3.1	`deepseek-ai/DeepSeek-V3.1`	Enabled by default
DeepSeek V3 0324	`deepseek-ai/DeepSeek-V3-0324`	Enabled by default
Minimax M2.5	`MiniMaxAI/MiniMax-M2.5`	Enabled by default
OpenAI GPT OSS 120B	`openai/gpt-oss-120b`	Enabled by default
Kimi K2.5	`moonshotai/Kimi-K2.5`	Opt-in via `chat_template_args`
GLM 4.7	`zai-org/GLM-4.7`	Opt-in via `chat_template_args`
GLM 4.6	`zai-org/GLM-4.6`	Opt-in via `chat_template_args`

GPT OSS 120B also supports reasoning_effort. Models not listed here don’t support reasoning.

Enable thinking

Enable thinking for Kimi K2.5 and GLM models by passing chat_template_args.

Python
JavaScript
cURL

Pass chat_template_args through extra_body since it extends the standard OpenAI API:

response = client.chat.completions.create(
    model="moonshotai/Kimi-K2.5",
    messages=[{"role": "user", "content": "What is the sum of the first 100 prime numbers?"}],
    extra_body={"chat_template_args": {"enable_thinking": True}},
    max_tokens=4096,
    stream=True,
)

Include chat_template_args directly in the request options:

const response = await client.chat.completions.create({
    model: "moonshotai/Kimi-K2.5",
    messages: [{ role: "user", content: "What is the sum of the first 100 prime numbers?" }],
    chat_template_args: { enable_thinking: true },
    max_tokens: 4096,
    stream: true,
});

Include chat_template_args in the JSON request body:

curl https://inference.baseten.co/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Api-Key $BASETEN_API_KEY" \
  -d '{
    "model": "moonshotai/Kimi-K2.5",
    "messages": [{"role": "user", "content": "What is the sum of the first 100 prime numbers?"}],
    "chat_template_args": {"enable_thinking": true},
    "max_tokens": 4096,
    "stream": false
  }'

Control reasoning depth

The reasoning_effort parameter controls how thoroughly the model reasons through a problem. Currently, only GPT OSS 120B supports this parameter.

Value	Behavior
`low`	Faster responses, less thorough reasoning
`medium`	Balanced (default)
`high`	Slower responses, more thorough reasoning

Python
JavaScript
cURL

Pass reasoning_effort through extra_body since it extends the standard OpenAI API:

from openai import OpenAI
import os

client = OpenAI(
    base_url="https://inference.baseten.co/v1",
    api_key=os.environ.get("BASETEN_API_KEY")
)

response = client.chat.completions.create(
    model="openai/gpt-oss-120b",
    messages=[
        {"role": "user", "content": "What is the sum of the first 100 prime numbers?"}
    ],
    extra_body={"reasoning_effort": "high"}  
)

print(response.choices[0].message.content)

Include reasoning_effort directly in the request options:

import OpenAI from "openai";

const client = new OpenAI({
    baseURL: "https://inference.baseten.co/v1",
    apiKey: process.env.BASETEN_API_KEY,
});

const response = await client.chat.completions.create({
    model: "openai/gpt-oss-120b",
    messages: [
        { role: "user", content: "What is the sum of the first 100 prime numbers?" }
    ],
    reasoning_effort: "high"
});

console.log(response.choices[0].message.content);

Include reasoning_effort in the JSON request body:

curl https://inference.baseten.co/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Api-Key $BASETEN_API_KEY" \
  -d '{
    "model": "openai/gpt-oss-120b",
    "messages": [{"role": "user", "content": "What is the sum of the first 100 prime numbers?"}],
    "reasoning_effort": "high"
  }'

Parse the response

The model’s thinking process appears in reasoning_content, separate from the final answer in content. Both fields are returned on the message object.

{
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "The sum of the first 100 prime numbers is 24,133.",
        "reasoning_content": "Let me work through this step by step. The first prime number is 2..."
      }
    }
  ],
  "usage": {
    "prompt_tokens": 90,
    "completion_tokens": 3423,
    "total_tokens": 3513
  }
}

Reasoning tokens are included in completion_tokens and count toward your total usage and billing.

Decide when to reason

Reasoning improves quality for tasks that benefit from step-by-step thinking: mathematical calculations, multi-step logic problems, code generation with complex requirements, and analysis requiring multiple considerations. For straightforward tasks like simple Q&A or text generation, reasoning adds latency and token cost without improving quality. In these cases, use a model without reasoning support or set reasoning_effort to low.

Get started

Concepts

Development

Deployment

Inference

Engines

Training

Organization

Observability

Troubleshooting

Supported models

Enable thinking

Control reasoning depth

Parse the response

Decide when to reason

Get started

Concepts

Development

Deployment

Inference

Engines

Training

Organization

Observability

Troubleshooting

​Supported models

​Enable thinking

​Control reasoning depth

​Parse the response

​Decide when to reason

Supported models

Enable thinking

Control reasoning depth

Parse the response

Decide when to reason