Asynchronously call a named environment of a model.

curl --request POST \ --url https://model-{model_id}.api.baseten.co/environments/{env_name}/async_predict \ --header 'Authorization: Bearer <token>' \ --header 'Content-Type: application/json' \ --data ' { "model_input": {}, "webhook_endpoint": "<string>", "priority": 0, "max_time_in_queue_seconds": 600, "inference_retry_config": { "max_attempts": 3, "initial_delay_ms": 1000, "max_delay_ms": 5000 } } '

Authorizations

Authorization

string

header

required

Pass your Baseten API key. Clients automatically send Authorization: Bearer <key>. Direct callers can also use Authorization: Api-Key <key>; both schemes are accepted.

Path Parameters

env_name

string

required

The name of the environment (e.g. production, staging).

Body

application/json

There is a 256 KiB size limit on async predict request payloads.

model_input

object

required

JSON-serializable model input.

webhook_endpoint

string<uri>

HTTPS URL to receive the prediction result via webhook. Both HTTP/2 and HTTP/1.1 are supported. If omitted, the model must save outputs so they can be accessed later.

priority

integer

default:0

Priority of the request. Lower values are higher priority.

Required range: 0 <= x <= 2

max_time_in_queue_seconds

integer

default:600

Maximum time in seconds a request will spend in the queue before expiring. Must be between 10 seconds and 72 hours.

Required range: 10 <= x <= 259200

inference_retry_config

object

Exponential backoff parameters for retrying predict requests.

Show child attributes

Response

Async predict request enqueued.

request_id

string

required

The ID of the async request.