Baseten provides OpenAI-compatible API endpoints for all available Model APIs. This means you can use standard OpenAI client libraries—no wrappers, no rewrites, no surprises. If your code already works with OpenAI, it’ll work with Baseten.
This guide walks you through getting started, making your first call, and using advanced features like structured outputs and tool calling.
Prerequisites
Before you begin, make sure you have:
- A Baseten account
- An API key
- The OpenAI client library for your language of choice
Supported models
Baseten currently offers several high-performing open-source LLMs as Models APIs:
- Deepseek R1 0528 (slug:
deepseek-ai/DeepSeek-R1-0528
)
- Deepseek V3 0324 (slug:
deepseek-ai/DeepSeek-V3-0324
)
- Llama 4 Maverick (slug:
meta-llama/Llama-4-Maverick-17B-128E-Instruct
)
- Llama 4 Scout (slug:
meta-llama/Llama-4-Scout-17B-16E-Instruct
)
- Qwen 3 🔜
Please update the model
in the examples below to the slug of the model you’d like to test.
Make your first API call
client = OpenAI(
base_url="https://inference.baseten.co/v1",
api_key=os.environ.get("BASETEN_API_KEY")
)
response = client.chat.completions.create(
model="deepseek-ai/DeepSeek-V3-0324",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Your question here"}
]
)
print(response.choices[0].message.content)
client = OpenAI(
base_url="https://inference.baseten.co/v1",
api_key=os.environ.get("BASETEN_API_KEY")
)
response = client.chat.completions.create(
model="deepseek-ai/DeepSeek-V3-0324",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Your question here"}
]
)
print(response.choices[0].message.content)
const client = new OpenAI({
baseURL: "https://inference.baseten.co/v1",
apiKey: process.env.BASETEN_API_KEY,
});
// Use the client
try {
const response = await client.chat.completions.create({
model: "deepseek-ai/DeepSeek-V3-0324",
messages: [
{ role: "system", content: "You are a helpful assistant." },
{ role: "user", content: "Hello, how are you?" },
],
})
curl https://inference.baseten.co/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Api-Key $BASETEN_API_KEY" \
-d '{
"model": "deepseek-ai/DeepSeek-V3-0324",
"messages": [{ "role": "user", "content": "Your content here" }],
"stream": true,
"max_tokens": 10
}'
echo # Add a newline for cleaner output
Request parameters
Model APIs support all commonly used OpenAI ChatCompletions parameters, including:
model
: Slug of the model you want to call (see below)
messages
: Array of message objects (role
+ content
)
temperature
: Controls randomness (0-2, default 1)
max_tokens
: Maximum number of tokens to generate
stream
: Boolean to enable streaming responses
Structured outputs
To get structured JSON output from the model, you can use the response_format
parameter. Set response_format={"type": "json_object"}
to enable JSON mode. For more complex schemas, you can define a JSON schema.
Let’s say you want to extract specific information from a user’s query, like a name and an email address.
client = OpenAI(
base_url="https://inference.baseten.co/v1",
api_key=os.environ.get("BASETEN_API_KEY")
)
response = client.chat.completions.create(
model="deepseek-ai/DeepSeek-V3-0324", # Or any other supported model
messages=[
{"role": "system", "content": "You are an expert at extracting information."},
{"role": "user", "content": "My name is Jane Doe and my email is jane.doe@example.com. I\'d like to know more about your services."}
],
response_format={
"type": "json_object",
"json_schema": {
"name": "user_details",
"description": "User contact information",
"schema": {
"type": "object",
"properties": {
"name": {
"type": "string",
"description": "The user\'s full name"
},
"email": {
"type": "string",
"description": "The user\'s email address"
}
},
"required": ["name", "email"]
},
"strict": True # Enforce schema adherence
}
}
)
output = json.loads(response.choices[0].message.content)
print(output)
# Expected output:
# {
# "name": "Jane Doe",
# "email": "jane.doe@example.com"
# }
client = OpenAI(
base_url="https://inference.baseten.co/v1",
api_key=os.environ.get("BASETEN_API_KEY")
)
response = client.chat.completions.create(
model="deepseek-ai/DeepSeek-V3-0324", # Or any other supported model
messages=[
{"role": "system", "content": "You are an expert at extracting information."},
{"role": "user", "content": "My name is Jane Doe and my email is jane.doe@example.com. I\'d like to know more about your services."}
],
response_format={
"type": "json_object",
"json_schema": {
"name": "user_details",
"description": "User contact information",
"schema": {
"type": "object",
"properties": {
"name": {
"type": "string",
"description": "The user\'s full name"
},
"email": {
"type": "string",
"description": "The user\'s email address"
}
},
"required": ["name", "email"]
},
"strict": True # Enforce schema adherence
}
}
)
output = json.loads(response.choices[0].message.content)
print(output)
# Expected output:
# {
# "name": "Jane Doe",
# "email": "jane.doe@example.com"
# }
const client = new OpenAI({
baseURL: "https://inference.baseten.co/v1",
apiKey: process.env.BASETEN_API_KEY,
});
// Use the client for structured output
try {
const response = await client.chat.completions.create({
model: "deepseek-ai/DeepSeek-V3-0324",
messages: [
{ role: "system", content: "You are an expert at extracting information." },
{ role: "user", content: "My name is Jane Doe and my email is jane.doe@example.com. I'd like to know more about your services." },
],
response_format: {
type: "json_object",
json_schema: {
name: "user_details",
description: "User contact information",
schema: {
type: "object",
properties: {
name: {
type: "string",
description: "The user's full name"
},
email: {
type: "string",
description: "The user's email address"
}
},
required: ["name", "email"]
},
strict: true
}
}
})
curl -s "https://inference.baseten.co/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Api-Key $BASETEN_API_KEY" \
-d @- << EOF
{
"model": "deepseek-ai/DeepSeek-V3-0324",
"messages": [
{"role": "system", "content": "You are an expert at extracting information."},
{"role": "user", "content": "My name is Jane Doe and my email is jane.doe@example.com. I'd like to know more about your services."}
],
"response_format": {
"type": "json_object",
"json_schema": {
"name": "user_details",
"description": "User contact information",
"schema": {
"type": "object",
"properties": {
"name": { "type": "string", "description": "The user's full name" },
"email": { "type": "string", "description": "The user's email address" }
},
"required": ["name", "email"]
},
"strict": true
}
}
}
EOF
echo # Add a newline for cleaner output
When strict: true
is specified within the json_schema
, the model is constrained to produce output that strictly adheres to the provided schema. If the model cannot or will not produce output that matches the schema, it may return an error or a refusal.
Model compatibility note: We recommend using Deepseek V3 for tool calling functionality. We do not recommend using Deepseek R1 for tool calling as the model was not post-trained for tool calling.
Tool calling is fully supported. Simply define a list of tools and pass them via the tools
parameter:
type
: The type of tool to call. Currently, the only supported value is function
.
function
: A dictionary with the following keys:
name
: The name of the function to be called
description
: A description of what the function does
parameters
: A JSON Schema object describing the function parameters
# Example list of tools
{
"tools": [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City and state"
}
},
"required": ["location"]
}
}
}
]
}
Here’s how you might implement tool calling:
client = OpenAI(
base_url="https://inference.baseten.co/v1",
api_key=os.environ.get("BASETEN_API_KEY")
)
# Define the message and available tools
messages = [{"role": "user", "content": "What's the weather like in Boston?"}]
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string", "description": "City name"},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
},
"required": ["location"]
}
}
}
]
# Make the initial request
response = client.chat.completions.create(
model="deepseek-ai/DeepSeek-V3-0324",
messages=messages,
tools=tools,
tool_choice="auto"
)
# Process tool calls if any
if response.choices[0].message.tool_calls:
# Get the function call details
tool_call = response.choices[0].message.tool_calls[0]
function_args = json.loads(tool_call.function.arguments)
# Call the function and get the result
function_response = get_weather(location=function_args.get("location"))
# Add function response to conversation
messages.append(response.choices[0].message)
messages.append({
"tool_call_id": tool_call.id,
"role": "tool",
"name": tool_call.function.name,
"content": function_response
})
# Get the final response with the function result
final_response = client.chat.completions.create(
model="deepseek-ai/DeepSeek-V3-0324",
messages=messages
)
print(final_response.choices[0].message.content)
client = OpenAI(
base_url="https://inference.baseten.co/v1",
api_key=os.environ.get("BASETEN_API_KEY")
)
# Define the message and available tools
messages = [{"role": "user", "content": "What's the weather like in Boston?"}]
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string", "description": "City name"},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
},
"required": ["location"]
}
}
}
]
# Make the initial request
response = client.chat.completions.create(
model="deepseek-ai/DeepSeek-V3-0324",
messages=messages,
tools=tools,
tool_choice="auto"
)
# Process tool calls if any
if response.choices[0].message.tool_calls:
# Get the function call details
tool_call = response.choices[0].message.tool_calls[0]
function_args = json.loads(tool_call.function.arguments)
# Call the function and get the result
function_response = get_weather(location=function_args.get("location"))
# Add function response to conversation
messages.append(response.choices[0].message)
messages.append({
"tool_call_id": tool_call.id,
"role": "tool",
"name": tool_call.function.name,
"content": function_response
})
# Get the final response with the function result
final_response = client.chat.completions.create(
model="deepseek-ai/DeepSeek-V3-0324",
messages=messages
)
print(final_response.choices[0].message.content)
const client = new OpenAI({
baseURL: "https://inference.baseten.co/v1",
apiKey: process.env.BASETEN_API_KEY,
});
// Make initial request with tools
const response = await client.chat.completions.create({
model: "deepseek-ai/DeepSeek-V3-0324",
messages: [{ role: "user", content: "What's the weather like in Boston?" }],
tools: [{
type: "function",
function: {
name: "get_weather",
description: "Get the current weather",
parameters: {
type: "object",
properties: {
location: { type: "string", description: "City name" }
},
required: ["location"]
}
}
}]
});
// Process tool calls if any
if (response.choices[0].message.tool_calls) {
const toolCall = response.choices[0].message.tool_calls[0];
const args = JSON.parse(toolCall.function.arguments);
// Call function and get result
const functionResponse = getWeather(args.location);
// Submit function result back to model
const messages = [
{ role: "user", content: "What's the weather like in Boston?" },
response.choices[0].message,
{ tool_call_id: toolCall.id, role: "tool", name: "get_weather", content: functionResponse }
];
const finalResponse = await client.chat.completions.create({
model: "deepseek-ai/DeepSeek-V3-0324",
messages: messages
})
curl -s "https://inference.baseten.co/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Api-Key $BASETEN_API_KEY" \
-d '{
"model": "deepseek-ai/DeepSeek-V3-0324",
"messages": [{"role": "user", "content": "What'\''s the weather like in Boston?"}],
"tools": [{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string", "description": "City name"}
},
"required": ["location"]
}
}
}]
}')
# Extract tool call details
TOOL_CALL_ID=$(
# If we have a tool call, prepare the weather data and send it back
if [ -n "$TOOL_CALL_ID" ]; then
WEATHER_DATA='{"location":"Boston","temperature":"72","forecast":"sunny"}'
# Send the function result back to the API
FINAL_RESPONSE=$(curl -s "https://inference.baseten.co/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Api-Key $BASETEN_API_KEY" \
-d '{
"model": "deepseek-ai/DeepSeek-V3-0324",
"messages": [
{"role": "user", "content": "What'\''s the weather like in Boston?"},
{"role": "assistant", "content": null, "tool_calls": [{"id": "'$TOOL_CALL_ID'", "type": "function", "function": {"name": "get_weather", "arguments": "{\"location\":\"Boston\"}"}}]},
{"role": "tool", "tool_call_id": "'$TOOL_CALL_ID'", "name": "get_weather", "content": "'$WEATHER_DATA'"}
]
}')
# Print the final response
fi
Error Handling
The API returns standard HTTP error codes:
400
: Bad request (malformed input)
401
: Unauthorized (invalid or missing API key)
402
: Payment required
404
: Model not found
429
: Rate limit exceeded
500
: Internal server error
Check the response body for specific error details and suggested resolutions.
Migrating from OpenAI
To migrate from OpenAI to Baseten’s OpenAI-compatible API, you need to make these changes to your existing code:
- Replace your OpenAI API key with your Baseten API key
- Change the base URL to
https://inference.baseten.co/v1.
- Update model names to match Baseten-supported slugs.