Model APIs
Model APIs give you instant access to popular open-source LLMs with optimized serving. Baseten manages the infrastructure (shared GPU clusters, model weights, and serving configuration), so there’s no deployment step and nothing to configure. The supported catalog includes models like DeepSeek, GLM, and Kimi, with all models supporting tool calling and most supporting structured outputs. Pricing is per million tokens. Because Model APIs implement the OpenAI chat completions format, switching from OpenAI to Baseten requires only changing the base URL and API key in your existing client. All requests route through a single endpoint:Deployed model endpoints
When you deploy a custom model or chain with Truss, Baseten assigns it a dedicated subdomain for routing. This is the path for models that aren’t in the Model APIs catalog: models with custom serving logic, fine-tuned weights, or multi-step inference pipelines built as chains. You control the hardware, scaling behavior, and serving engine. Each endpoint URL includes a deployment target: an environment name likeproduction, the development deployment, or a specific deployment ID.
For models:
model_id: the model’s alphanumeric ID, found in your model dashboard.chain_id: the chain’s alphanumeric ID, found in your chain dashboard.deployment_type_or_id: eitherdevelopment,production, or a specific deployment’s alphanumeric ID.endpoint: the API action, such aspredict.
Predict endpoints
All predict endpoints accept a JSON request body that is forwarded directly to the model’spredict function (for models) or chain entrypoint (for chains).
- Models
- Chains
- Regional
| Method | Endpoint | Description |
|---|---|---|
POST | /production/predict | Call the production environment. |
POST | /environments/{env_name}/predict | Call a named environment. |
POST | /development/predict | Call the development deployment. |
POST | /deployment/{deployment_id}/predict | Call a specific deployment. |
POST | /production/async_predict | Async predict on production. |
POST | /environments/{env_name}/async_predict | Async predict on a named environment. |
POST | /development/async_predict | Async predict on the development deployment. |
POST | /deployment/{deployment_id}/async_predict | Async predict on a specific deployment. |
Status endpoints
| Method | Endpoint | Description |
|---|---|---|
GET | /async_request/{request_id} | Get the status of an async request. |
DELETE | /async_request/{request_id} | Cancel a queued async request. |
GET | /production/async_queue_status | Queue status for production. |
GET | /environments/{env_name}/async_queue_status | Queue status for a named environment. |
GET | /development/async_queue_status | Queue status for the development deployment. |
GET | /deployment/{deployment_id}/async_queue_status | Queue status for a specific deployment. |
GET | /async_queue_status | Queue status (regional). |
Wake endpoints
| Method | Endpoint | Description |
|---|---|---|
POST | /production/wake | Wake the production environment. |
POST | /environments/{env_name}/wake | Wake a named environment. |
POST | /development/wake | Wake the development deployment. |
POST | /deployment/{deployment_id}/wake | Wake a specific deployment. |
POST | /wake | Wake (regional). |