Any model deployment by ID
Updates a deployment’s autoscaling settings and returns the update status.
Authorizations
Pass your Baseten API key. Clients automatically send Authorization: Bearer <key>. Direct callers can also use Authorization: Api-Key <key>; both schemes are accepted.
Body
A request to update autoscaling settings for a deployment. All fields are optional, and we only update ones passed in.
Minimum number of replicas
0
Maximum number of replicas
7
Timeframe of traffic considered for autoscaling decisions
600
Waiting period before scaling down any active replica
120
Number of requests per replica before scaling up
2
Target utilization percentage for scaling up/down.
70
Target number of in-flight tokens for autoscaling decisions. Early access only.
40000
Maximum rate at which replicas can scale down (e.g. 2.0 means at most halve replicas per window).
1 < x <= 22