Call your model
Run inference on deployed models
Once deployed, your model is accessible via an API endpoint. To make an inference request, you’ll need:
- Model ID
- An API key for your Baseten account.
- JSON-serializable model input
Predict API endpoints
Baseten provides multiple endpoints for different inference modes:
/predict
– Standard synchronous inference./async_predict
– Asynchronous inference for long-running tasks.
Endpoints are available for environments and all deployments. See the API reference for details.
Sync API endpoints
Custom servers support both predict
endpoints as well as a special sync
endpoint. By using the sync
endpoint you are able to call different routes in your custom server.
Here are a few example for the given example that show how the sync endpoint maps to the custom server’s routes.
https://model-{model_id}.../sync/health
->/health
https://model-{model_id}.../sync/items
->/items
https://model-{model_id}.../sync/items/123
->/items/123
OpenAI SDK
When deploying a model with Engine Builder, you will get an OpenAI compatible server. If you are already using one of the OpenAI SDKs, you will simply need to update the base url to your Baseten model URL and include your Baseten API Key.
Alternative invocation methods
- Truss CLI:
truss predict
- Model Dashboard: “Playground” button in the Baseten UI