Skip to main content
Baseten enforces two rate limits to ensure fair use and system stability:
Request rate limits : Maximum API requests per minute.
Token rate limits : Maximum tokens processed per minute (input + output combined).
Default limits vary by account status.
Account RPM TPM Basic (unverified)15 100,000 Basic (verified)120 500,000 Pro 120 1,000,000 Enterprise Custom Custom
If you exceed these limits, the API returns a 429 Too Many Requests error.
To request a rate limit increase, contact us .
Set budgets
Budgets let you control Model API usage and avoid unexpected costs. Budgets apply only to Model APIs, not dedicated deployments. Your team receives email notifications at 75%, 90%, and 100% of budget.
Enforce budgets
Budgets can be enforced or non-enforced:
Enforced : Requests are rejected when the budget is reached.
Not enforced : You receive notifications but remain responsible for costs over the budget.