Skip to main content
Model APIs provide instant access to high-performance LLMs through OpenAI-compatible endpoints. Point your existing OpenAI SDK at Baseten’s inference endpoint and start making calls, no model deployment required. Unlike self-deployed models, where you configure hardware, engines, and scaling yourself, Model APIs run on shared infrastructure that Baseten manages. You get a fixed set of popular models with optimized serving out of the box. When you need a model that isn’t in the supported list, or want dedicated GPUs with custom scaling, deploy your own with Truss.

Supported models

Enable a model from the Model APIs page in the Baseten dashboard.
ModelSlugContextMax output
DeepSeek V3 0324deepseek-ai/DeepSeek-V3-0324164k131k
DeepSeek V3.1deepseek-ai/DeepSeek-V3.1164k131k
GLM 4.6zai-org/GLM-4.6200k200k
GLM 4.7zai-org/GLM-4.7200k200k
GLM 5zai-org/GLM-5203k203k
Kimi K2.5moonshotai/Kimi-K2.5262k262k
Minimax M2.5MiniMaxAI/MiniMax-M2.5204k204k
OpenAI GPT OSS 120Bopenai/gpt-oss-120b128k128k

Pricing

Pricing is per million tokens.
ModelInputOutput
OpenAI GPT OSS 120B$0.10$0.50
Minimax M2.5$0.30$1.20
DeepSeek V3.1$0.50$1.50
GLM 4.6$0.60$2.20
GLM 4.7$0.60$2.20
Kimi K2.5$0.60$3.00
DeepSeek V3 0324$0.77$0.77
GLM 5$0.95$3.15
Query the /v1/models endpoint for current pricing.

Feature support

All models support tool calling. Support for other features varies by model. See Reasoning for configuration details.
ModelJSON modeStructured outputsReasoningVision
DeepSeek V3 0324YesYesEnabled by defaultNo
DeepSeek V3.1NoNoEnabled by defaultNo
GLM 4.6YesYesOpt-inNo
GLM 4.7YesYesOpt-inNo
GLM 5YesYesNoNo
Kimi K2.5YesYesOpt-inYes
Minimax M2.5YesYesEnabled by defaultNo
OpenAI GPT OSS 120BYesYesEnabled by defaultNo
GLM models also support top_p and top_k sampling parameters.

Create a chat completion

If you’ve already completed the quickstart, you have a working client. The examples below show a multi-turn conversation with a system message, which you can adapt for your application.
from openai import OpenAI
import os

client = OpenAI(
    base_url="https://inference.baseten.co/v1",
    api_key=os.environ["BASETEN_API_KEY"],
)

response = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-V3.1",
    messages=[
        {"role": "system", "content": "You are a concise technical writer."},
        {"role": "user", "content": "What is gradient descent?"},
        {"role": "assistant", "content": "An optimization algorithm that iteratively adjusts model parameters by moving in the direction of steepest decrease in the loss function."},
        {"role": "user", "content": "How does the learning rate affect it?"}
    ],
)

print(response.choices[0].message.content)
Replace the model slug with any model from the supported models table.

Features

Model APIs are compatible with the OpenAI Chat Completions API. Available features include structured outputs, tool calling, reasoning, vision, and streaming (stream: true). Not all models support every feature. See feature support for per-model availability. For the complete parameter reference, see the Chat Completions API documentation.

List available models

Query the /v1/models endpoint for the current list of models with metadata including pricing, context lengths, and supported features.
curl https://inference.baseten.co/v1/models \
  -H "Authorization: Api-Key $BASETEN_API_KEY"

Migrate from OpenAI

To migrate existing OpenAI code to Baseten, change three values:
  1. Replace your API key with a Baseten API key.
  2. Change the base URL to https://inference.baseten.co/v1.
  3. Update the model name to a Baseten model slug.
from openai import OpenAI
import os

client = OpenAI(
    base_url="https://inference.baseten.co/v1",  
    api_key=os.environ["BASETEN_API_KEY"]  
)

response = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-V3.1",  
    messages=[{"role": "user", "content": "Hello"}]
)

Handle errors

Model APIs return standard HTTP error codes:
CodeMeaning
400Invalid request (check your parameters)
401Invalid or missing API key
402Payment required
404Model not found
429Rate limit exceeded
500Internal server error
The response body contains details about the error and suggested resolutions.

Next steps