Development model deployment
Updates a development deployment’s autoscaling settings and returns the update status.
Authorizations
Pass your Baseten API key. Clients automatically send Authorization: Bearer <key>. Direct callers can also use Authorization: Api-Key <key>; both schemes are accepted.
Path Parameters
Body
A request to update autoscaling settings for a deployment. All fields are optional, and we only update ones passed in.
Minimum number of replicas
0
Maximum number of replicas
7
Timeframe of traffic considered for autoscaling decisions
600
Waiting period before scaling down any active replica
120
Number of requests per replica before scaling up
2
Target utilization percentage for scaling up/down.
70
Target number of in-flight tokens for autoscaling decisions. Early access only.
40000
Maximum percentage of replicas that can be removed per autoscaling window (1–50). E.g. 20 means at most 20% of replicas are removed per window.
1 <= x <= 5020