import requests
import os

model_id = ""  # Replace this with your model ID
webhook_endpoint = ""  # Replace this with your webhook endpoint URL
# Read secrets from environment variables
baseten_api_key = os.environ["BASETEN_API_KEY"]

# Call the async_predict endpoint of the production deployment
resp = requests.post(
    f"https://model-{model_id}.api.baseten.co/production/async_predict",
    headers={"Authorization": f"Api-Key {baseten_api_key}"},
    json={
        "model_input": {"prompt": "hello world!"},
        "webhook_endpoint": webhook_endpoint
        # Optional fields for priority, max_time_in_queue_seconds, etc
    },
)

print(resp.json())
{
  "request_id": "<string>"
}
This is a beta feature and subject to breaking changes.

Parameters

model_id
string
required

The ID of the model you want to call.

Headers

Authorization
string
required

Your Baseten API key, formatted with prefix Api-Key (e.g. {"Authorization": "Api-Key abcd1234.abcd1234"}).

Body

There is a 256 KiB size limit to /async_predict request payloads.

model_input
json
required

JSON-serializable model input.

webhook_endpoint
string
default: "null"
Baseten does not store model outputs. If webhook_endpoint is empty, your model must save prediction outputs so they can be accessed later.

URL of the webhook endpoint. We require that webhook endpoints use HTTPS.

priority
integer
default: 0

Priority of the request. A lower value corresponds to a higher priority (e.g. requests with priority 0 are scheduled before requests of priority 1).

priority is between 0 and 2, inclusive.

max_time_in_queue_seconds
integer
default: 600

Maximum time a request will spend in the queue before expiring.

max_time_in_queue_seconds must be between 10 seconds and 12 hours, inclusive.

inference_retry_config
json

Exponential backoff parameters used to retry the model predict request.

Response

request_id
string
required

The ID of the async request.