Create Anthropic Messages API requests against Baseten Model APIs.
https://inference.baseten.co/v1/messages.
x-api-key by default. Baseten reads Authorization, so override default_headers when creating the client:
Pass your Baseten API key using either the Api-Key or Bearer scheme: Authorization: Api-Key YOUR_API_KEY or Authorization: Bearer YOUR_API_KEY. The Anthropic SDK's default x-api-key header is not accepted; override default_headers to send Authorization instead.
Request body for creating a message.
The model slug to use. Find available models at Model APIs.
The conversation history as an ordered list of input messages. Alternating user and assistant roles are expected; the final message must be from the user.
The maximum number of tokens to generate in the response. Required by the Messages API. The response may be shorter if it finishes naturally or hits a stop sequence.
x >= 1A system prompt that sets the model's behavior. Pass either a single string or an array of text content blocks.
Controls randomness. Lower values are more deterministic. Range: 0 to 1.
0 <= x <= 1Nucleus sampling: only consider tokens with cumulative probability up to this value.
x <= 1Limits token selection to the top K most probable tokens at each step.
x >= 0Custom text sequences that will stop generation. When a stop sequence is hit, stop_reason is stop_sequence and stop_sequence contains the matched string.
If true, the response is streamed as server-sent events. Each event has a type such as message_start, content_block_delta, or message_stop.
A list of tools the model may call. Each tool has a name, description, and input_schema (a JSON Schema object).
Controls which tool (if any) the model must call.
An object describing metadata about the request. Supports user_id for abuse detection.
Successful response
The message response returned by the model.
A unique identifier for this message, such as msg_abc123.
The object type, always message.
"message"The role of the generated message, always assistant.
"assistant"An array of content blocks generated by the model. Text responses contain a single text block; responses that invoke tools contain tool_use blocks.
A text content block.
The model slug that produced the response.
Why the model stopped generating: end_turn (natural stop), max_tokens (hit the max_tokens limit), stop_sequence (matched a stop_sequences entry), or tool_use (model invoked a tool).
end_turn, max_tokens, stop_sequence, tool_use Token usage statistics for the request.
The stop sequence that was matched, if stop_reason is stop_sequence. Otherwise null.