Use this endpoint to call the development deployment of your model asynchronously.
The ID of the model you want to call.
Your Baseten API key, formatted with prefix Api-Key
(e.g. {"Authorization": "Api-Key abcd1234.abcd1234"}
).
There is a 256 KiB size limit to /async_predict
request payloads.
JSON-serializable model input.
webhook_endpoint
is empty, your model must save prediction outputs so they can be accessed later. URL of the webhook endpoint. We require that webhook endpoints use HTTPS. Both HTTP/2 and HTTP/1.1 protocols are supported.
Priority of the request. A lower value corresponds to a higher priority (e.g. requests with priority 0 are scheduled before requests of priority 1).
priority
is between 0 and 2, inclusive.
Maximum time a request will spend in the queue before expiring.
max_time_in_queue_seconds
must be between 10 seconds and 72 hours, inclusive.
Exponential backoff parameters used to retry the model predict request.
The ID of the async request.
Two types of rate limits apply when making async requests:
Calls to the /async_predict
endpoint are limited to 200 requests per second.
Each organization is limited to 50,000 QUEUED
or IN_PROGRESS
async requests, summed across all deployments.
If either limit is exceeded, subsequent /async_predict
requests will receive a 429 status code.
To avoid hitting these rate limits, we advise:
/async_predict
with exponential backoff in response to 429 errors.