POST
request to a user-defined webhook upon completion.
Use async requests for:
- Long-running inference tasks that may otherwise hit request timeouts.
- Batched inference jobs.
- Prioritizing certain inference requests.
Async fast facts:
- Async requests can be made to any model—no model code changes necessary.
- Async requests can remain queued for up to 72 hours and run for up to 1 hour.
- Async requests are not compatible with streaming model output.
- Async request inputs and model outputs are not stored after an async request has been completed. Instead, model outputs will be sent to your webhook via a
POST
request.
Quick start
There are two ways to use async inference:- Provide a webhook endpoint where model outputs will be sent via a
POST
request. If providing a webhook, you can use async inference on any model, without making any changes to your model code. - Inside your Truss’
model.py
, save prediction results to cloud storage. If a webhook endpoint is provided, your model outputs will also be sent to your webhook.
model.py
must write model outputs to a cloud storage bucket or database as part of its implementation.
1
Setup webhook endpoint
Set up a webhook endpoint for handling completed async requests. Since Baseten doesn’t store model outputs, model outputs from async requests will be sent to your webhook endpoint.Before creating your first async request, try running a sample request against your webhook endpoint to ensure that it can consume async predict results properly. Check out this example webhook test.We recommend using this Repl as a starting point.
2
Schedule an async predict request
Call Save the See the async inference API reference for more endpoint details.
/async_predict
on your model. The body of an /async_predict
request includes the model input in model_input
field, with the addition of a webhook endpoint (from the previous step) in the webhook_endpoint
field.Python
request_id
from the /async_predict
response to check its status or cancel it.201
3
Check async predict results
Using the Once your model has finished executing the request, the async predict result will be sent to your webhook in a
request_id
saved from the previous step, check the status of your async predict request:Python
POST
request.4
Secure your webhook
We strongly recommend securing the requests sent to your webhooks to validate that they are from Baseten.For instructions, see our guide to securing async requests.
Chains: this guide is written for Truss models, but
Chains support async inference likewise. An
Chain entrypoint can be invoked via its
async_run_remote
endpoint, e.g.
https://chain-{chain_id}.api.baseten.co/production/async_run_remote
. The
internal Chainlet-Chainlet call will still run synchronously.User guide
Configuring the webhook endpoint
Configure your webhook endpoint to handlePOST
requests with async predict results. We require that webhook endpoints use HTTPS.
We recommend running a sample request against your webhook endpoint to ensure that it can consume async predict results properly. Try running this webhook test.
For local development, we recommend using this Repl as a starting point. This code validates the webhook request and logs the payload.
Making async requests
Python
/async_predict
endpoint. See the async inference API reference for more endpoint details.
Getting and canceling async requests
You may get the status of an async request for up to 1 hour after the request has been completed.
Python
Processing async predict results
Baseten does not store async predict results. Ensure that prediction outputs are either processed by your webhook, or saved to cloud storage in your model code (for example, in your model’spostprocess
method).
If a webhook endpoint was provided in the /async_predict
request, the async predict results will be sent in a POST
request to the webhook endpoint. Errors in executing the async prediction will be included in the errors
field of the async predict result.
Async predict result schema:
request_id
(string): the ID of the completed async request. This matches therequest_id
field of the/async_predict
response.model_id
(string): the ID of the model that executed the requestdeployment_id
(string): the ID of the deployment that executed the requesttype
(string): the type of the async predict result. This will always be"async_request_completed"
, even in error cases.time
(datetime): the time in UTC at which the request was sent to the webhookdata
(dict or string): the prediction outputerrors
(list): any errors that occurred in processing the async request
Observability
Metrics for async request execution are available on the Metrics tab of your model dashboard.- Async requests are included in inference latency and volume metrics.
- A time in async queue chart displays the time an async predict request spent in the
QUEUED
state before getting processed by the model. - A async queue size chart displays the current number of queued async predict requests.

The time in async queue chart.
Securing async inference
Since async predict results are sent to a webhook available to anyone over the internet with the endpoint, you’ll want to have some verification that these results sent to the webhook are actually coming from Baseten. We recommend leveraging webhook signatures to secure webhook payloads and ensure they are from Baseten. This is a two-step process:- Create a webhook secret.
- Validate a webhook signature sent as a header along with the webhook request payload.
Creating webhook secrets
Webhook secrets can be generated via the Secrets tab.
Generate a webhook secret with the "Add webhook secret" button.
Validating webhook signatures
If a webhook secret exists, Baseten will include a webhook signature in the"X-BASETEN-SIGNATURE"
header of the webhook request so you can verify that it is coming from Baseten.
A Baseten signature header looks like:
"X-BASETEN-SIGNATURE": "v1=signature"
Where signature
is an HMAC generated using a SHA-256 hash function calculated over the whole async predict result and signed using a webhook secret.
If multiple webhook secrets are active, a signature will be generated using each webhook secret. In the example below, the newer webhook secret was used to create newsignature
and the older (soon to expire) webhook secret was used to create oldsignature
.
"X-BASETEN-SIGNATURE": "v1=newsignature,v1=oldsignature"
To validate a Baseten signature, we recommend the following. A full Baseten signature validation example can be found in this Repl.
1
Compare timestamps
Compare the async predict result timestamp with the current time and decide if it was received within an acceptable tolerance window.
2
Recompute Baseten signature
Recreate the Baseten signature using webhook secret(s) and the async predict result.
3
Compare signatures
Compare the expected Baseten signature with the actual computed signature using
compare_digest
, which will return a boolean representing whether the signatures are indeed the same.Keeping webhook secrets secure
We recommend periodically rotating webhook secrets.
FAQs
Can I run sync and async requests on the same model?
Yes, you can run both sync and async requests on the same model. Sync requests always take priority over async requests. Keep the following in mind:- Rate Limits: Ensure you adhere to rate limits, as they apply to async requests. Learn more
- Concurrency: Both sync and async requests count toward the total number of concurrent requests. Learn more