Skip to main content
cURL
curl --request PATCH \
--url https://api.baseten.co/v1/models/{model_id}/deployments/development/autoscaling_settings \
--header "Authorization: Api-Key $BASETEN_API_KEY" \
--data '{
  "min_replica": 0,
  "max_replica": 7,
  "autoscaling_window": 600,
  "scale_down_delay": 120,
  "concurrency_target": 2,
  "target_utilization_percentage": 70,
  "target_in_flight_tokens": 40000
}'
{
  "status": "ACCEPTED",
  "message": "<string>"
}

Documentation Index

Fetch the complete documentation index at: https://docs.baseten.co/llms.txt

Use this file to discover all available pages before exploring further.

To update autoscaling settings at the environment level, use the update environment settings endpoint.

Authorizations

Authorization
string
header
required

You must specify the scheme 'Api-Key' in the Authorization header. For example, Authorization: Api-Key <Your_Api_Key>

Path Parameters

model_id
string
required

Body

application/json

A request to update autoscaling settings for a deployment. All fields are optional, and we only update ones passed in.

min_replica
integer | null

Minimum number of replicas

Example:

0

max_replica
integer | null

Maximum number of replicas

Example:

7

autoscaling_window
integer | null

Timeframe of traffic considered for autoscaling decisions

Example:

600

scale_down_delay
integer | null

Waiting period before scaling down any active replica

Example:

120

concurrency_target
integer | null

Number of requests per replica before scaling up

Example:

2

target_utilization_percentage
integer | null

Target utilization percentage for scaling up/down.

Example:

70

target_in_flight_tokens
integer | null

Target number of in-flight tokens for autoscaling decisions. Early access only.

Example:

40000

Response

200 - application/json

The response to a request to update autoscaling settings.

status
enum<string>
required

Status of the request to update autoscaling settings

Available options:
ACCEPTED,
QUEUED,
UNCHANGED
message
string
required

A message describing the status of the request to update autoscaling settings