Model APIs
Model APIs give you instant access to popular open-source LLMs with optimized serving. Baseten manages the infrastructure — shared GPU clusters, model weights, and serving configuration — so there is no deployment step and nothing to configure. The supported catalog includes models like DeepSeek, GLM, and Kimi, with all models supporting tool calling and most supporting structured outputs. Pricing is per million tokens. Because Model APIs implement the OpenAI chat completions format, switching from OpenAI to Baseten requires only changing the base URL and API key in your existing client. All requests route through a single endpoint:Deployed model endpoints
When you deploy a custom model or chain with Truss, Baseten assigns it a dedicated subdomain for routing. This is the path for models that aren’t in the Model APIs catalog — models with custom serving logic, fine-tuned weights, or multi-step inference pipelines built as chains. You control the hardware, scaling behavior, and serving engine. Each endpoint URL includes a deployment target: an environment name likeproduction, the development deployment, or a specific deployment ID.
For models:
model_id: the model’s alphanumeric ID, found in your model dashboard.chain_id: the chain’s alphanumeric ID, found in your chain dashboard.deployment_type_or_id: eitherdevelopment,production, or a specific deployment’s alphanumeric ID.endpoint: the API action, such aspredict.
Predict endpoints
- Models
- Chains
| Method | Endpoint | Description |
|---|---|---|
POST | /environments/{env_name}/predict | Call an environment. |
POST | /development/predict | Call the development deployment. |
POST | /deployment/{deployment_id}/predict | Call any deployment. |
POST | /environments/{env_name}/async_predict | For Async inference, call the deployment associated with the specified environment. |
POST | /development/async_predict | For Async inference, call the development deployment. |
POST | /deployment/{deployment_id}/async_predict | For Async inference, call any published deployment of your model. |
WEBSOCKET | /environments/{env_name}/websocket | For WebSockets, connect to an environment. |
WEBSOCKET | /development/websocket | For WebSockets, connect to the development deployment. |
WEBSOCKET | /deployment/{deployment_id}/websocket | For WebSockets, connect to a deployment. |
Async status endpoints
| Method | Endpoint | Description |
|---|---|---|
GET | /async_request/{request_id} | Get the status of a model async request. |
GET | /async_request/{request_id} | Get the status of a chain async request. |
DEL | /async_request/{request_id} | Cancel an async request. |
GET | /environments/{env_name}/async_queue_status | Get the async queue status for a model associated with the specified environment. |
GET | /development/async_queue_status | Get the status of a development deployment’s async queue. |
GET | /deployment/{deployment_id}/async_queue_status | Get the status of a deployment’s async queue. |
Wake endpoints
| Method | Endpoint | Description |
|---|---|---|
POST | /production/wake | Wake the production environment of your model. |
POST | /development/wake | Wake the development deployment of your model. |
POST | /deployment/{deployment_id}/wake | Wake any deployment of your model. |