import requests
import os

model_id = ""
webhook_endpoint = ""
# Read secrets from environment variables
baseten_api_key = os.environ["BASETEN_API_KEY"]

resp = requests.post(
    f"https://model-{model_id}.api.baseten.co/development/async_predict",
    headers={"Authorization": f"Api-Key {baseten_api_key}"},
    json={
      "model_input": {"prompt": "hello world!"},
      "webhook_endpoint": webhook_endpoint
      # Optional fields for priority, max_time_in_queue_seconds, etc
    },
)

print(resp.json())
{
  "request_id": "<string>"
}
This is a beta feature and subject to breaking changes.

Parameters

model_id
string
required

The ID of the model you want to call.

Headers

Authorization
string
required

Your Baseten API key, formatted with prefix Api-Key (e.g. {"Authorization": "Api-Key abcd1234.abcd1234"}).

Body

There is a 256 KiB size limit to /async_predict request payloads.

model_input
json
required

JSON-serializable model input.

webhook_endpoint
string
required

URL of the webhook endpoint. We require that webhook endpoints use HTTPS.

priority
integer
default: 0

Priority of the request. A lower value corresponds to a higher priority (e.g. requests with priority 0 are scheduled before requests of priority 1).

priority is between 0 and 2, inclusive.

max_time_in_queue_seconds
integer
default: 600

Maximum time a request will spend in the queue before expiring.

max_time_in_queue_seconds must be between 10 seconds and 12 hours, inclusive.

inference_retry_config
json

Exponential backoff parameters used to retry the model predict request.

Response

request_id
string
required

The ID of the async request.