Inference API
Overview
The inference API is used to call deployed models and chains.
Each deployment has a dedicated subdomain on api.baseten.co
for optimized routing.
For models, the endpoints follow this format:
For chains, the endpoints follow this format:
Where:
model_id
– The model’s alphanumeric ID (found in your model dashboard).chain_id
– The chain’s alphanumeric ID (found in your chain dashboard).deployment_type_or_id
– Eitherdevelopment
,production
, or a specific deployment’s alphanumeric ID.endpoint
– The API action, such aspredict
.
For long-running tasks, the inference API supports asynchronous inference with priority queuing.
Predict endpoints
Method | Endpoint | Description |
---|---|---|
POST | /environments/{env_name}/predict | Call an environment |
POST | /development/predict | Call the development deployment |
POST | /deployment/{deployment_id}/predict | Call any deployment |
POST | /deployment/{deployment_id}/async_predict | For Async inference, call any published deployment of your model. |
POST | /environments/{env_name}/async_predict | For Async inference, Call the deployment associated with the specified environment. |
POST | /development/async_predict | For Async inference, Call the deployment associated with the specified environment. |
DEL | /async_request/{request_id} | For Async inference, cancel a request |
Async status endpoints
Method | Endpoint | Description |
---|---|---|
GET | /async_request/{request_id} | Get the status of an async request. |
GET | /environments/{env_name}/async_queue_status | Get the async queue status for a model associated with the specified environment. |
GET | /development/async_queue_status | Get the status of a development deployment’s async queue. |
GET | /deployment/{deployment_id}/async_queue_status | Get the status of a deployment’s async queue. |
Wake endpoints
Method | Endpoint | Description |
---|---|---|
POST | /production/wake | Wake the production environment of your model. |
POST | /development/wake | Wake the development deployment of your model. |
POST | /deployment/{deployment_id}/wake | Wake any deployment of your model. |
Was this page helpful?