Using Model APIs on Baseten

Baseten provides OpenAI-compatible API endpoints for all available Model APIs. This means you can use standard OpenAI client libraries—no wrappers, no rewrites, no surprises. If your code already works with OpenAI, it’ll work with Baseten. This guide walks you through getting started, making your first call, and using advanced features like structured outputs and tool calling.

Prerequisites

Before you begin, make sure you have:

A Baseten account
An API key
The OpenAI client library for your language of choice

Supported models

Baseten currently offers several high-performing open-source LLMs as Models APIs:

Deepseek R1 0528 (slug: deepseek-ai/DeepSeek-R1-0528)
Deepseek V3 0324 (slug: deepseek-ai/DeepSeek-V3-0324)
Llama 4 Maverick (slug: meta-llama/Llama-4-Maverick-17B-128E-Instruct)
Llama 4 Scout (slug: meta-llama/Llama-4-Scout-17B-16E-Instruct)
Kimi K2 Instruct (slug: moonshotai/Kimi-K2-Instruct)
Qwen3 235B 2507 (slug: Qwen/Qwen3-235B-A22B-Instruct-2507)
Qwen3 Coder 480B (slug: Qwen/Qwen3-Coder-480B-A35B-Instruct)
GPT OSS 120B (slug: openai/gpt-oss-120b)

Please update the model in the examples below to the slug of the model you’d like to test.

Make your first API call

client = OpenAI(
    base_url="https://inference.baseten.co/v1",
    api_key=os.environ.get("BASETEN_API_KEY")
)

response = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-V3-0324",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Your question here"}
    ]
)

print(response.choices[0].message.content)

Request parameters

Model APIs support all commonly used OpenAI ChatCompletions parameters, including:

model: Slug of the model you want to call (see below)
messages: Array of message objects (role + content)
temperature: Controls randomness (0-2, default 1)
max_tokens: Maximum number of tokens to generate
stream: Boolean to enable streaming responses

Structured outputs

To get structured JSON output from the model, you can use the response_format parameter. Set response_format={"type": "json_object"} to enable JSON mode. For more complex schemas, you can define a JSON schema. Let’s say you want to extract specific information from a user’s query, like a name and an email address.

client = OpenAI(
    base_url="https://inference.baseten.co/v1",
    api_key=os.environ.get("BASETEN_API_KEY")
)

response = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-V3-0324", # Or any other supported model
    messages=[
        {"role": "system", "content": "You are an expert at extracting information."},
        {"role": "user", "content": "My name is Jane Doe and my email is jane.doe@example.com. I\'d like to know more about your services."}
    ],
    response_format={
        "type": "json_object",
        "json_schema": {
            "name": "user_details",
            "description": "User contact information",
            "schema": {
                "type": "object",
                "properties": {
                    "name": {
                        "type": "string",
                        "description": "The user\'s full name"
                    },
                    "email": {
                        "type": "string",
                        "description": "The user\'s email address"
                    }
                },
                "required": ["name", "email"]
            },
            "strict": True # Enforce schema adherence
        }
    }
)

output = json.loads(response.choices[0].message.content)
print(output)
# Expected output:
# {
#   "name": "Jane Doe",
#   "email": "jane.doe@example.com"
# }

When strict: true is specified within the json_schema, the model is constrained to produce output that strictly adheres to the provided schema. If the model cannot or will not produce output that matches the schema, it may return an error or a refusal.

Tool calling

Model compatibility note: We recommend using Deepseek V3 for tool calling functionality. We do not recommend using Deepseek R1 for tool calling as the model was not post-trained for tool calling.

Tool calling is fully supported. Simply define a list of tools and pass them via the tools parameter:

type: The type of tool to call. Currently, the only supported value is function.
function: A dictionary with the following keys:
- name: The name of the function to be called
- description: A description of what the function does
- parameters: A JSON Schema object describing the function parameters

# Example list of tools
{
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get the weather for a location",
        "parameters": {
          "type": "object",
          "properties": {
            "location": {
              "type": "string",
              "description": "City and state"
            }
          },
          "required": ["location"]
        }
      }
    }
  ]
}

Here’s how you might implement tool calling:

client = OpenAI(
    base_url="https://inference.baseten.co/v1",
    api_key=os.environ.get("BASETEN_API_KEY")
)

# Define the message and available tools
messages = [{"role": "user", "content": "What's the weather like in Boston?"}]
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {"type": "string", "description": "City name"},
                    "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
                },
                "required": ["location"]
            }
        }
    }
]

# Make the initial request
response = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-V3-0324",
    messages=messages,
    tools=tools,
    tool_choice="auto"
)

# Process tool calls if any
if response.choices[0].message.tool_calls:
    # Get the function call details
    tool_call = response.choices[0].message.tool_calls[0]
    function_args = json.loads(tool_call.function.arguments)
    
    # Call the function and get the result
    function_response = get_weather(location=function_args.get("location"))
    
    # Add function response to conversation
    messages.append(response.choices[0].message)
    messages.append({
        "tool_call_id": tool_call.id,
        "role": "tool",
        "name": tool_call.function.name,
        "content": function_response
    })
    
    # Get the final response with the function result
    final_response = client.chat.completions.create(
        model="deepseek-ai/DeepSeek-V3-0324",
        messages=messages
    )
    print(final_response.choices[0].message.content)

Error Handling

The API returns standard HTTP error codes:

400: Bad request (malformed input)
401: Unauthorized (invalid or missing API key)
402: Payment required
404: Model not found
429: Rate limit exceeded
500: Internal server error

Check the response body for specific error details and suggested resolutions.

Migrating from OpenAI

To migrate from OpenAI to Baseten’s OpenAI-compatible API, you need to make these changes to your existing code:

Replace your OpenAI API key with your Baseten API key
Change the base URL to https://inference.baseten.co/v1.
Update model names to match Baseten-supported slugs.

Get started

Concepts

Development

Deployment

Inference

Training

Observability

Troubleshooting

Using Model APIs on Baseten

Prerequisites

Supported models

Make your first API call

Request parameters

Structured outputs

Tool calling

Error Handling

Migrating from OpenAI

Get started

Concepts

Development

Deployment

Inference

Training

Observability

Troubleshooting

​Prerequisites

​Supported models

​Make your first API call

​Request parameters

​Structured outputs

​Tool calling

​Error Handling

​Migrating from OpenAI

Prerequisites

Supported models

Make your first API call

Request parameters

Structured outputs

Tool calling

Error Handling

Migrating from OpenAI