Rate Limits

To ensure fair use and system stability, Baseten enforces two rate limits:

  • Request rate limits — maximum number of API requests per minute
  • Token rate limits — maximum number of tokens processed per minute (input + output combined)

Default limits vary based on your account status.

AccountRPMTPM
Starter (unverified)100100,000
Starter (verified)3000500,000
Pro6000+2,000,000+
EnterpriseCustomCustom

If you exceed these limits, the API will return a 429 Too Many Requests error.

Requesting higher limits

If you have high volume, are a verified customer, and need more headroom, you can contact us to request a rate limit increase.


Setting budgets

Setting a budget allows you to control your Model API usage and avoid unexpected costs. Usage budgets apply only to Model APIs (not dedicated deployments). Your team will be notified by email at 75%, 90%, and 100% of budget.

Enforcing budgets

When setting a budget, you can choose to enforce it or not.

  • If you choose to enforce it, requests will be rejected once the budget is reached.
  • If you choose not to enforce it, you will be notified at 75%, 90%, and 100% of budget and you’ll be responsible for any costs incurred over the budget.