Inference
How to call your model
Run inference on deployed models
Once you’ve deployed your model, it’s time to use it! Every model on Baseten is served behind an API endpoint. To call a model, you need:
- The model’s ID.
- An API key for your Baseten account.
- JSON-serializable model input.
You can call a model using the:
/predict
endpoint for the production deployment, development deployment or other published deployment./async_predict
endpoint for the production deployment, development deployment or other published deployment.- Truss CLI command
truss predict
. - “Call model” button on the model dashboard within your Baseten workspace.
Call by API endpoint
import requests
import os
model_id = ""
# Read secrets from environment variables
baseten_api_key = os.environ["BASETEN_API_KEY"]
resp = requests.post(
f"https://model-{model_id}.api.baseten.co/production/predict",
headers={"Authorization": f"Api-Key {baseten_api_key}"},
json={}, # JSON-serializable model input
)
print(resp.json())
See the inference API reference for more details.
Call by async API endpoint
import requests
import os
model_id = ""
# Read secrets from environment variables
baseten_api_key = os.environ["BASETEN_API_KEY"]
resp = requests.post(
f"https://model-{model_id}.api.baseten.co/production/async_predict",
headers={"Authorization": f"Api-Key {baseten_api_key}"},
json={
"model_input": {"prompt": "hello world!"},
"webhook_endpoint": "https://my_webhook.com/webhook"
# Optional fields for priority, max_time_in_queue_seconds, etc
}
)
print(resp.json())
See the async inference API reference for API details and the async guide for more information about running async inference.
Call with Truss CLI
truss predict --model $MODEL_ID -d '$MODEL_INPUT'
See the Truss CLI reference for more details.