Skip to main content
Some Model APIs support extended thinking, where the model reasons through a problem before producing a final answer. The reasoning process generates additional tokens that appear in a separate reasoning_content field, distinct from the final response.

Supported models

ModelSlugReasoning
DeepSeek V3.1deepseek-ai/DeepSeek-V3.1Enabled by default
DeepSeek V3 0324deepseek-ai/DeepSeek-V3-0324Enabled by default
Minimax M2.5MiniMaxAI/MiniMax-M2.5Enabled by default
OpenAI GPT OSS 120Bopenai/gpt-oss-120bEnabled by default
Kimi K2.5moonshotai/Kimi-K2.5Opt-in via chat_template_args
GLM 4.7zai-org/GLM-4.7Opt-in via chat_template_args
GLM 4.6zai-org/GLM-4.6Opt-in via chat_template_args
GPT OSS 120B also supports reasoning_effort. Models not listed here don’t support reasoning.

Enable thinking

Enable thinking for Kimi K2.5 and GLM models by passing chat_template_args.
Pass chat_template_args through extra_body since it extends the standard OpenAI API:
response = client.chat.completions.create(
    model="moonshotai/Kimi-K2.5",
    messages=[{"role": "user", "content": "What is the sum of the first 100 prime numbers?"}],
    extra_body={"chat_template_args": {"enable_thinking": True}},
    max_tokens=4096,
    stream=True,
)

Control reasoning depth

The reasoning_effort parameter controls how thoroughly the model reasons through a problem. Currently, only GPT OSS 120B supports this parameter.
ValueBehavior
lowFaster responses, less thorough reasoning
mediumBalanced (default)
highSlower responses, more thorough reasoning
Pass reasoning_effort through extra_body since it extends the standard OpenAI API:
from openai import OpenAI
import os

client = OpenAI(
    base_url="https://inference.baseten.co/v1",
    api_key=os.environ.get("BASETEN_API_KEY")
)

response = client.chat.completions.create(
    model="openai/gpt-oss-120b",
    messages=[
        {"role": "user", "content": "What is the sum of the first 100 prime numbers?"}
    ],
    extra_body={"reasoning_effort": "high"}  
)

print(response.choices[0].message.content)

Parse the response

The model’s thinking process appears in reasoning_content, separate from the final answer in content. Both fields are returned on the message object.
{
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "The sum of the first 100 prime numbers is 24,133.",
        "reasoning_content": "Let me work through this step by step. The first prime number is 2..."
      }
    }
  ],
  "usage": {
    "prompt_tokens": 90,
    "completion_tokens": 3423,
    "total_tokens": 3513
  }
}
Reasoning tokens are included in completion_tokens and count toward your total usage and billing.

Decide when to reason

Reasoning improves quality for tasks that benefit from step-by-step thinking: mathematical calculations, multi-step logic problems, code generation with complex requirements, and analysis requiring multiple considerations. For straightforward tasks like simple Q&A or text generation, reasoning adds latency and token cost without improving quality. In these cases, use a model without reasoning support or set reasoning_effort to low.