Limits
The default limit applies to every/v1/* endpoint. A few endpoints that touch shared build and deployment infrastructure have stricter limits.
| Endpoint | Limit |
|---|---|
POST /v1/models/{model_id}/deployments/development/activate | 20 requests/minute |
POST /v1/models/{model_id}/deployments/production/activate | 20 requests/minute |
POST /v1/models/{model_id}/deployments/{deployment_id}/activate | 20 requests/minute |
POST /v1/models/{model_id}/deployments/development/deactivate | 20 requests/minute |
POST /v1/models/{model_id}/deployments/production/deactivate | 20 requests/minute |
POST /v1/models/{model_id}/deployments/{deployment_id}/deactivate | 20 requests/minute |
POST /v1/models/{model_id}/deployments/development/retry | 10 requests/minute |
POST /v1/models/{model_id}/deployments/production/retry | 10 requests/minute |
POST /v1/models/{model_id}/deployments/{deployment_id}/retry | 10 requests/minute |
All other /v1/* endpoints | 100 requests/second |
Rate-limited responses
A request over the limit returns429 Too Many Requests:
retry_after is the number of seconds until the current rate-limit window resets. Wait at least that long before retrying.
Retry handling
For CI pipelines or scripts that call the management API in a loop, handle429 explicitly:
retry_after instead of retrying immediately. A tight retry loop wastes API calls; the server rejects every request until the window resets.