- Request rate limits: Maximum API requests per minute.
- Token rate limits: Maximum tokens processed per minute (input + output combined).
Set budgets
Budgets let you control Model API usage and avoid unexpected costs. Budgets apply only to Model APIs, not dedicated deployments. Your team receives email notifications at 75%, 90%, and 100% of budget.Enforce budgets
Budgets can be enforced or non-enforced:- Enforced: Requests are rejected when the budget is reached.
- Not enforced: You receive notifications but remain responsible for costs over the budget.