Update production deployment autoscaling settings
Updates a production deployment’s autoscaling settings and returns the update status.
Documentation Index
Fetch the complete documentation index at: https://docs.baseten.co/llms.txt
Use this file to discover all available pages before exploring further.
Authorizations
Pass your Baseten API key. Clients automatically send Authorization: Bearer <key>. Direct callers can also use Authorization: Api-Key <key>; both schemes are accepted.
Path Parameters
Body
A request to update autoscaling settings for a deployment. All fields are optional, and we only update ones passed in.
Minimum number of replicas
0
Maximum number of replicas
7
Timeframe of traffic considered for autoscaling decisions
600
Waiting period before scaling down any active replica
120
Number of requests per replica before scaling up
2
Target utilization percentage for scaling up/down.
70
Target number of in-flight tokens for autoscaling decisions. Early access only.
40000