Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.baseten.co/llms.txt

Use this file to discover all available pages before exploring further.

Some Model APIs support extended thinking, where the model reasons through a problem before producing a final answer. The reasoning process generates additional tokens that appear in a separate reasoning_content field, distinct from the final response.

Supported models

ModelSlugReasoning
DeepSeek V3.1deepseek-ai/DeepSeek-V3.1Enabled by default
DeepSeek V4 Prodeepseek-ai/DeepSeek-V4-ProEnabled by default
Minimax M2.5MiniMaxAI/MiniMax-M2.5Enabled by default
Nemotron Supernvidia/Nemotron-120B-A12BEnabled by default
OpenAI GPT OSS 120Bopenai/gpt-oss-120bEnabled by default
Kimi K2.5moonshotai/Kimi-K2.5Opt-in via chat_template_args
Kimi K2.6moonshotai/Kimi-K2.6Opt-in via chat_template_args
GLM 4.7zai-org/GLM-4.7Opt-in via chat_template_args
GLM 5zai-org/GLM-5Opt-in via chat_template_args
DeepSeek V4 Pro and GPT OSS 120B also support reasoning_effort. Models not listed here don’t support reasoning.

Enable thinking

Enable thinking for Kimi K2.5, Kimi K2.6, and GLM 4.7 by passing chat_template_args.
Pass chat_template_args through extra_body since it extends the standard OpenAI API:
response = client.chat.completions.create(
    model="moonshotai/Kimi-K2.5",
    messages=[{"role": "user", "content": "What is the sum of the first 100 prime numbers?"}],
    extra_body={"chat_template_args": {"enable_thinking": True}},
    max_tokens=4096,
    stream=True,
)

Control reasoning depth

The reasoning_effort parameter controls how thoroughly the model reasons through a problem. DeepSeek V4 Pro and GPT OSS 120B support this parameter.
ValueBehavior
lowFaster responses, less thorough reasoning
mediumBalanced (default)
highSlower responses, more thorough reasoning
xhighMaximum reasoning depth, highest token cost (DeepSeek V4 Pro only)
Pass reasoning_effort through extra_body since it extends the standard OpenAI API:
from openai import OpenAI
import os

client = OpenAI(
    base_url="https://inference.baseten.co/v1",
    api_key=os.environ.get("BASETEN_API_KEY")
)

response = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-V4-Pro",
    messages=[
        {"role": "user", "content": "What is the sum of the first 100 prime numbers?"}
    ],
    extra_body={"reasoning_effort": "high"}  
)

print(response.choices[0].message.content)
Reasoning improves quality for tasks that benefit from step-by-step thinking: mathematical calculations, multi-step logic problems, code generation with complex requirements, and analysis requiring multiple considerations. For straightforward tasks like simple Q&A or text generation, reasoning adds latency and token cost without improving quality. In these cases, use a model without reasoning support or set reasoning_effort to low.

Parse the response

The model’s thinking process appears in reasoning_content, separate from the final answer in content. Both fields are returned on the message object.
Read reasoning_content and content directly off the message object:
from openai import OpenAI
import os

client = OpenAI(
    base_url="https://inference.baseten.co/v1",
    api_key=os.environ.get("BASETEN_API_KEY"),
)

response = client.chat.completions.create(
    model="moonshotai/Kimi-K2.6",
    messages=[{"role": "user", "content": "Is 91 a prime number? Answer in one sentence."}],
    extra_body={"chat_template_args": {"enable_thinking": True}},
)

message = response.choices[0].message
print("Reasoning:", message.reasoning_content)
print("Answer:", message.content)
The response body contains both fields on the assistant message:
{
  "choices": [
    {
      "message": {
        "role": "assistant",
        "reasoning_content": "The user is asking whether 91 is a prime number... 91 = 7 × 13, so it is not prime...",
        "content": "No, 91 is not a prime number because it can be factored as $7 \\times 13$."
      }
    }
  ],
  "usage": {
    "prompt_tokens": 21,
    "completion_tokens": 203,
    "total_tokens": 224
  }
}
Reasoning tokens are included in completion_tokens and count toward your total usage and billing.