Skip to main content
Python
import anthropic
import os

API_KEY = os.environ["BASETEN_API_KEY"]

client = anthropic.Anthropic(
    base_url="https://inference.baseten.co",
    api_key=API_KEY,
    default_headers={"Authorization": f"Bearer {API_KEY}"},
)

response = client.messages.create(
    model="deepseek-ai/DeepSeek-V3.1",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Hello!"}
    ],
)

print(response.content[0].text)
{
  "id": "<string>",
  "type": "<string>",
  "role": "<string>",
  "content": [
    {
      "type": "<string>",
      "text": "<string>"
    }
  ],
  "model": "<string>",
  "stop_reason": "end_turn",
  "usage": {
    "input_tokens": 123,
    "output_tokens": 123
  },
  "stop_sequence": "<string>"
}
Download the OpenAPI schema for code generation and client libraries.
Model APIs accept requests in the Anthropic Messages API format at https://inference.baseten.co/v1/messages.

Call with the Anthropic SDK

The Anthropic SDK sends the API key as x-api-key by default. Baseten reads Authorization, so override default_headers when creating the client:
import anthropic
import os

API_KEY = os.environ["BASETEN_API_KEY"]

client = anthropic.Anthropic(
    base_url="https://inference.baseten.co",
    api_key=API_KEY,
    default_headers={"Authorization": f"Bearer {API_KEY}"},
)

response = client.messages.create(
    model="deepseek-ai/DeepSeek-V3.1",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Hello!"}
    ],
)

print(response.content[0].text)

Authorizations

Authorization
string
header
required

Pass your Baseten API key using either the Api-Key or Bearer scheme: Authorization: Api-Key YOUR_API_KEY or Authorization: Bearer YOUR_API_KEY. The Anthropic SDK's default x-api-key header is not accepted; override default_headers to send Authorization instead.

Body

application/json

Request body for creating a message.

model
string
required

The model slug to use. Find available models at Model APIs.

messages
InputMessage · object[]
required

The conversation history as an ordered list of input messages. Alternating user and assistant roles are expected; the final message must be from the user.

max_tokens
integer
required

The maximum number of tokens to generate in the response. Required by the Messages API. The response may be shorter if it finishes naturally or hits a stop sequence.

Required range: x >= 1
system

A system prompt that sets the model's behavior. Pass either a single string or an array of text content blocks.

temperature
number
default:1

Controls randomness. Lower values are more deterministic. Range: 0 to 1.

Required range: 0 <= x <= 1
top_p
number

Nucleus sampling: only consider tokens with cumulative probability up to this value.

Required range: x <= 1
top_k
integer

Limits token selection to the top K most probable tokens at each step.

Required range: x >= 0
stop_sequences
string[]

Custom text sequences that will stop generation. When a stop sequence is hit, stop_reason is stop_sequence and stop_sequence contains the matched string.

stream
boolean
default:false

If true, the response is streamed as server-sent events. Each event has a type such as message_start, content_block_delta, or message_stop.

tools
ToolDefinition · object[]

A list of tools the model may call. Each tool has a name, description, and input_schema (a JSON Schema object).

tool_choice
ToolChoice · object

Controls which tool (if any) the model must call.

metadata
Metadata · object

An object describing metadata about the request. Supports user_id for abuse detection.

Response

Successful response

The message response returned by the model.

id
string
required

A unique identifier for this message, such as msg_abc123.

type
string
required

The object type, always message.

Allowed value: "message"
role
string
required

The role of the generated message, always assistant.

Allowed value: "assistant"
content
(TextBlock · object | ToolUseBlock · object)[]
required

An array of content blocks generated by the model. Text responses contain a single text block; responses that invoke tools contain tool_use blocks.

A text content block.

model
string
required

The model slug that produced the response.

stop_reason
enum<string>
required

Why the model stopped generating: end_turn (natural stop), max_tokens (hit the max_tokens limit), stop_sequence (matched a stop_sequences entry), or tool_use (model invoked a tool).

Available options:
end_turn,
max_tokens,
stop_sequence,
tool_use
usage
Usage · object
required

Token usage statistics for the request.

stop_sequence
string | null

The stop sequence that was matched, if stop_reason is stop_sequence. Otherwise null.