Skip to main content
Model APIs provide instant access to high-performance LLMs through OpenAI-compatible endpoints. Point your existing OpenAI SDK at Baseten’s inference endpoint and start making calls—no model deployment required.

Prerequisites

To use Model APIs, you need:
  1. A Baseten account
  2. An API key
  3. The OpenAI SDK for your language

Supported models

Enable a model from the Model APIs page in the Baseten dashboard.
ModelSlugContext
OpenAI GPT OSS 120Bopenai/gpt-oss-120b128k
DeepSeek V3.2deepseek-ai/DeepSeek-V3.2131k
DeepSeek V3.1deepseek-ai/DeepSeek-V3.1164k
DeepSeek V3 0324deepseek-ai/DeepSeek-V3-0324164k
Kimi K2 Thinkingmoonshotai/Kimi-K2-Thinking262k
Kimi K2 0905moonshotai/Kimi-K2-Instruct-0905128k
Qwen3 Coder 480BQwen/Qwen3-Coder-480B-A35B-Instruct262k
GLM 4.7zai-org/GLM-4.7200k
GLM 4.6zai-org/GLM-4.6200k

Create a chat completion

Initialize the OpenAI client with Baseten’s base URL and your API key:
from openai import OpenAI
import os

client = OpenAI(
    base_url="https://inference.baseten.co/v1",
    api_key=os.environ.get("BASETEN_API_KEY")
)

response = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-V3.2",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain gradient descent in one sentence."}
    ]
)

print(response.choices[0].message.content)
Replace the model slug with any model from the supported models table.

Features

Model APIs support the full OpenAI Chat Completions API:
  • Structured outputs: Generate JSON that conforms to a schema.
  • Tool calling: Let the model call functions you define.
  • Reasoning: Control extended thinking for complex tasks.
  • Streaming: Set stream: true to receive responses as server-sent events.
For the complete parameter reference, see the Chat Completions API documentation.

Migrate from OpenAI

To migrate existing OpenAI code to Baseten, change three values:
  1. Replace your API key with a Baseten API key.
  2. Change the base URL to https://inference.baseten.co/v1.
  3. Update the model name to a Baseten model slug.
from openai import OpenAI
import os

client = OpenAI(
    base_url="https://inference.baseten.co/v1",  
    api_key=os.environ["BASETEN_API_KEY"]  
)

response = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-V3.2",  
    messages=[{"role": "user", "content": "Hello"}]
)

Handle errors

Model APIs return standard HTTP error codes:
CodeMeaning
400Invalid request (check your parameters)
401Invalid or missing API key
402Payment required
404Model not found
429Rate limit exceeded
500Internal server error
The response body contains details about the error and suggested resolutions.

Next steps