Run asynchronous inference on deployed models
POST
request to a user-defined webhook upon completion.
Use async requests for:
POST
request.POST
request. If providing a webhook, you can use async inference on any model, without making any changes to your model code.model.py
, save prediction results to cloud storage. If a webhook endpoint is provided, your model outputs will also be sent to your webhook.model.py
must write model outputs to a cloud storage bucket or database as part of its implementation.
Setup webhook endpoint
Schedule an async predict request
/async_predict
on your model. The body of an /async_predict
request includes the model input in model_input
field, with the addition of a webhook endpoint (from the previous step) in the webhook_endpoint
field.request_id
from the /async_predict
response to check its status or cancel it.Check async predict results
request_id
saved from the previous step, check the status of your async predict request:POST
request.Secure your webhook
async_run_remote
endpoint, e.g.
https://chain-{chain_id}.api.baseten.co/production/async_run_remote
. The
internal Chainlet-Chainlet call will still run synchronously.POST
requests with async predict results. We require that webhook endpoints use HTTPS.
We recommend running a sample request against your webhook endpoint to ensure that it can consume async predict results properly. Try running this webhook test.
For local development, we recommend using this Repl as a starting point. This code validates the webhook request and logs the payload.
/async_predict
endpoint. See the async inference API reference for more endpoint details.
postprocess
method).
If a webhook endpoint was provided in the /async_predict
request, the async predict results will be sent in a POST
request to the webhook endpoint. Errors in executing the async prediction will be included in the errors
field of the async predict result.
Async predict result schema:
request_id
(string): the ID of the completed async request. This matches the request_id
field of the /async_predict
response.model_id
(string): the ID of the model that executed the requestdeployment_id
(string): the ID of the deployment that executed the requesttype
(string): the type of the async predict result. This will always be "async_request_completed"
, even in error cases.time
(datetime): the time in UTC at which the request was sent to the webhookdata
(dict or string): the prediction outputerrors
(list): any errors that occurred in processing the async requestQUEUED
state before getting processed by the model.The time in async queue chart.
Generate a webhook secret with the "Add webhook secret" button.
"X-BASETEN-SIGNATURE"
header of the webhook request so you can verify that it is coming from Baseten.
A Baseten signature header looks like:
"X-BASETEN-SIGNATURE": "v1=signature"
Where signature
is an HMAC generated using a SHA-256 hash function calculated over the whole async predict result and signed using a webhook secret.
If multiple webhook secrets are active, a signature will be generated using each webhook secret. In the example below, the newer webhook secret was used to create newsignature
and the older (soon to expire) webhook secret was used to create oldsignature
.
"X-BASETEN-SIGNATURE": "v1=newsignature,v1=oldsignature"
To validate a Baseten signature, we recommend the following. A full Baseten signature validation example can be found in this Repl.
Compare timestamps
Recompute Baseten signature
Compare signatures
compare_digest
, which will return a boolean representing whether the signatures are indeed the same.