Skip to main content
Baseten enforces two rate limits to ensure fair use and system stability:
  • Request rate limits: Maximum API requests per minute.
  • Token rate limits: Maximum tokens processed per minute (input + output combined).
Default limits vary by account status.
AccountRPMTPM
Basic (unverified)15100,000
Basic (verified)120500,000
Pro1201,000,000
EnterpriseCustomCustom
If your workspace is on the Basic (unverified) tier and you need the higher Basic (verified) limits, contact us to request verification. To move to the Pro or Enterprise tier, contact us through the same form.
If you exceed these limits, the API returns a 429 Too Many Requests error. See Inference errors for how to respond. To request a rate limit increase, contact us.

Set budgets

Budgets let you control Model API usage and avoid unexpected costs. Budgets apply only to Model APIs, not dedicated deployments. Your team receives email notifications at 75%, 90%, and 100% of budget.

Enforce budgets

Budgets can be enforced or non-enforced:
  • Enforced: Requests are rejected when the budget is reached.
  • Not enforced: You receive notifications but remain responsible for costs over the budget.

Next steps

Inference errors

Handle 429 Too Many Requests and other status codes

Model APIs overview

Supported models, pricing, and feature support