Skip to main content
Some Model APIs support extended thinking, where the model reasons through a problem before producing a final answer. The reasoning process generates additional tokens that appear in a separate reasoning_content field, distinct from the final response.

Supported models

ModelSlugReasoning
DeepSeek V3.2deepseek-ai/DeepSeek-V3.2Enabled by default
DeepSeek V3.1deepseek-ai/DeepSeek-V3.1Enabled by default
DeepSeek V3 0324deepseek-ai/DeepSeek-V3-0324Enabled by default
Kimi K2 Thinkingmoonshotai/Kimi-K2-ThinkingAlways enabled
GLM 4.7zai-org/GLM-4.7Enabled by default
GLM 4.6zai-org/GLM-4.6Enabled by default
Models not listed here do not support reasoning.

Control reasoning depth

The reasoning_effort parameter controls how thoroughly the model reasons through a problem.
ValueBehavior
lowFaster responses, less thorough reasoning
mediumBalanced (default)
highSlower responses, more thorough reasoning
Pass reasoning_effort through extra_body since it extends the standard OpenAI API:
from openai import OpenAI
import os

client = OpenAI(
    base_url="https://inference.baseten.co/v1",
    api_key=os.environ.get("BASETEN_API_KEY")
)

response = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-V3.2",
    messages=[
        {"role": "user", "content": "What is the sum of the first 100 prime numbers?"}
    ],
    extra_body={"reasoning_effort": "high"}  
)

print(response.choices[0].message.content)

Parse the response

The model’s thinking process appears in reasoning_content, separate from the final answer in content.
{
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "The sum of the first 100 prime numbers is 24,133.",
        "reasoning_content": "Let me work through this step by step. The first prime number is 2..."
      }
    }
  ],
  "usage": {
    "prompt_tokens": 18,
    "completion_tokens": 245,
    "total_tokens": 263,
    "completion_tokens_details": {
      "reasoning_tokens": 198
    }
  }
}
The reasoning_tokens field in completion_tokens_details shows how many tokens the model used for reasoning. These tokens count toward your total usage and billing.

Decide when to reason

Reasoning improves quality for tasks that benefit from step-by-step thinking: mathematical calculations, multi-step logic problems, code generation with complex requirements, and analysis requiring multiple considerations. For straightforward tasks like simple Q&A or text generation, reasoning adds latency and token cost without improving quality. In these cases, use a model without reasoning support or set reasoning_effort to low.