Once you’ve deployed your model, it’s time to use it! Every model on Baseten is served behind an API endpoint. To call a model, you need:

  • The model’s ID.
  • An API key for your Baseten account.
  • JSON-serializable model input.

You can call a model using the:

Call by API endpoint

import requests
import os

model_id = ""
# Read secrets from environment variables
baseten_api_key = os.environ["BASETEN_API_KEY"]

resp = requests.post(
    f"https://model-{model_id}.api.baseten.co/production/predict",
    headers={"Authorization": f"Api-Key {baseten_api_key}"},
    json={}, # JSON-serializable model input
)

print(resp.json())

See the inference API reference for more details.

Call by async API endpoint

This is a beta feature and subject to breaking changes.
import requests
import os

model_id = ""
# Read secrets from environment variables
baseten_api_key = os.environ["BASETEN_API_KEY"]

resp = requests.post(
    f"https://model-{model_id}.api.baseten.co/production/async_predict",
    headers={"Authorization": f"Api-Key {baseten_api_key}"},
    json={
      "model_input": {"prompt": "hello world!"},
      "webhook_endpoint": "https://my_webhook.com/webhook"
      # Optional fields for priority, max_time_in_queue_seconds, etc
    }
)

print(resp.json())

See the async inference API reference for API details and the async guide for more information about running async inference.

Call with Truss CLI

truss predict --model $MODEL_ID -d '$MODEL_INPUT'

See the Truss CLI reference for more details.