- Request rate limits: Maximum API requests per minute.
- Token rate limits: Maximum tokens processed per minute (input + output combined).
If your workspace is on the Basic (unverified) tier and you need the higher Basic (verified) limits, contact us to request verification. To move to the Pro or Enterprise tier, contact us through the same form.
Set budgets
Budgets let you control Model API usage and avoid unexpected costs. Budgets apply only to Model APIs, not dedicated deployments. Your team receives email notifications at 75%, 90%, and 100% of budget.Enforce budgets
Budgets can be enforced or non-enforced:- Enforced: Requests are rejected when the budget is reached.
- Not enforced: You receive notifications but remain responsible for costs over the budget.
Next steps
Inference errors
Handle
429 Too Many Requests and other status codesModel APIs overview
Supported models, pricing, and feature support