Baseten provides OpenAI-compatible API endpoints for all available Model APIs. This means you can use standard OpenAI client libraries—no wrappers, no rewrites, no surprises. If your code already works with OpenAI, it’ll work with Baseten.

This guide walks you through getting started, making your first call, and using advanced features like structured outputs and tool calling.

Prerequisites

Before you begin, make sure you have:

  1. A Baseten account
  2. An API key
  3. The OpenAI client library for your language of choice

Supported models

Baseten currently offers several high-performing open-source LLMs as Models APIs:

  • Deepseek R1 (slug: deepseek-ai/DeepSeek-R1)
  • Deepseek V3 (slug: deepseek-ai/DeepSeek-V3-0324)
  • Llama 4 Maverick (slug: meta-llama/Llama-4-Maverick-17B-128E-Instruct)
  • Llama 4 Scout (slug: meta-llama/Llama-4-Scout-17B-16E-Instruct)
  • Qwen 3 🔜

Please update the model in the examples below to the slug of the model you’d like to test.

Make your first API call

client = OpenAI(
    base_url="https://inference.baseten.co/v1",
    api_key=os.environ.get("BASETEN_API_KEY")
)

response = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-V3-0324",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Your question here"}
    ]
)

print(response.choices[0].message.content)

Request parameters

Model APIs support all commonly used OpenAI ChatCompletions parameters, including:

  • model: Slug of the model you want to call (see below)
  • messages: Array of message objects (role + content)
  • temperature: Controls randomness (0-2, default 1)
  • max_tokens: Maximum number of tokens to generate
  • stream: Boolean to enable streaming responses

Structured outputs

To get structured JSON output from the model, you can use the response_format parameter. Set response_format={"type": "json_object"} to enable JSON mode. For more complex schemas, you can define a JSON schema.

Let’s say you want to extract specific information from a user’s query, like a name and an email address.

client = OpenAI(
    base_url="https://inference.baseten.co/v1",
    api_key=os.environ.get("BASETEN_API_KEY")
)

response = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-V3-0324", # Or any other supported model
    messages=[
        {"role": "system", "content": "You are an expert at extracting information."},
        {"role": "user", "content": "My name is Jane Doe and my email is jane.doe@example.com. I\'d like to know more about your services."}
    ],
    response_format={
        "type": "json_object",
        "json_schema": {
            "name": "user_details",
            "description": "User contact information",
            "schema": {
                "type": "object",
                "properties": {
                    "name": {
                        "type": "string",
                        "description": "The user\'s full name"
                    },
                    "email": {
                        "type": "string",
                        "description": "The user\'s email address"
                    }
                },
                "required": ["name", "email"]
            },
            "strict": True # Enforce schema adherence
        }
    }
)

output = json.loads(response.choices[0].message.content)
print(output)
# Expected output:
# {
#   "name": "Jane Doe",
#   "email": "jane.doe@example.com"
# }

When strict: true is specified within the json_schema, the model is constrained to produce output that strictly adheres to the provided schema. If the model cannot or will not produce output that matches the schema, it may return an error or a refusal.

Tool calling

Model compatibility note: We recommend using Deepseek V3 for tool calling functionality. We do not recommend using Deepseek R1 for tool calling as the model was not post-trained for tool calling.

Tool calling is fully supported. Simply define a list of tools and pass them via the tools parameter:

  • type: The type of tool to call. Currently, the only supported value is function.
  • function: A dictionary with the following keys:
    • name: The name of the function to be called
    • description: A description of what the function does
    • parameters: A JSON Schema object describing the function parameters
# Example list of tools
{
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get the weather for a location",
        "parameters": {
          "type": "object",
          "properties": {
            "location": {
              "type": "string",
              "description": "City and state"
            }
          },
          "required": ["location"]
        }
      }
    }
  ]
}

Here’s how you might implement tool calling:

client = OpenAI(
    base_url="https://inference.baseten.co/v1",
    api_key=os.environ.get("BASETEN_API_KEY")
)

# Define the message and available tools
messages = [{"role": "user", "content": "What's the weather like in Boston?"}]
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {"type": "string", "description": "City name"},
                    "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
                },
                "required": ["location"]
            }
        }
    }
]

# Make the initial request
response = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-V3-0324",
    messages=messages,
    tools=tools,
    tool_choice="auto"
)

# Process tool calls if any
if response.choices[0].message.tool_calls:
    # Get the function call details
    tool_call = response.choices[0].message.tool_calls[0]
    function_args = json.loads(tool_call.function.arguments)
    
    # Call the function and get the result
    function_response = get_weather(location=function_args.get("location"))
    
    # Add function response to conversation
    messages.append(response.choices[0].message)
    messages.append({
        "tool_call_id": tool_call.id,
        "role": "tool",
        "name": tool_call.function.name,
        "content": function_response
    })
    
    # Get the final response with the function result
    final_response = client.chat.completions.create(
        model="deepseek-ai/DeepSeek-V3-0324",
        messages=messages
    )
    print(final_response.choices[0].message.content)

Error Handling

The API returns standard HTTP error codes:

  • 400: Bad request (malformed input)
  • 401: Unauthorized (invalid or missing API key)
  • 402: Payment required
  • 404: Model not found
  • 429: Rate limit exceeded
  • 500: Internal server error

Check the response body for specific error details and suggested resolutions.

Migrating from OpenAI

To migrate from OpenAI to Baseten’s OpenAI-compatible API, you need to make these changes to your existing code:

  1. Replace your OpenAI API key with your Baseten API key
  2. Change the base URL to https://inference.baseten.co/v1.
  3. Update model names to match Baseten-supported slugs.