Skip to main content
Baseten enforces two rate limits to ensure fair use and system stability:
  • Request rate limits: Maximum API requests per minute.
  • Token rate limits: Maximum tokens processed per minute (input + output combined).
Default limits vary by account status.
AccountRPMTPM
Basic (unverified)15100,000
Basic (verified)120500,000
Pro1201,000,000
EnterpriseCustomCustom
If you exceed these limits, the API returns a 429 Too Many Requests error. To request a rate limit increase, contact us.

Set budgets

Budgets let you control Model API usage and avoid unexpected costs. Budgets apply only to Model APIs, not dedicated deployments. Your team receives email notifications at 75%, 90%, and 100% of budget.

Enforce budgets

Budgets can be enforced or non-enforced:
  • Enforced: Requests are rejected when the budget is reached.
  • Not enforced: You receive notifications but remain responsible for costs over the budget.