You have a model deployed on Baseten and want to give your own customers access through your branded domain, with credentials you control and usage you meter. Baseten Frontier Gateway is the managed API gateway that makes this possible. It adds a federated-user resource model, per-user rate and usage limits, billing webhooks, and white-label routing on top of your Dedicated deployment, so your customers call your model through your domain with keys you mint and revoke through the Baseten REST API.Documentation Index
Fetch the complete documentation index at: https://docs.baseten.co/llms.txt
Use this file to discover all available pages before exploring further.
Frontier Gateway is enabled for your workspace by a Baseten engineer. To turn it on, talk to us.
How Frontier Gateway works
Frontier Gateway sits on top of an existing Dedicated deployment. You model each downstream customer as a federated user: a resource that owns the customer’s externalcustomer_id, the set of model slugs they’re allowed to call, and the rate and usage limits enforced on every call. You then mint one or more API keys under each user; those keys are what your customer uses.
When your customer calls the gateway with the federated key you issued them, Baseten validates the key, looks up the owning federated user, and enforces the user’s per-model rate and usage limits. Valid requests route to your Dedicated deployment, and the response returns to the caller. For each request, Baseten emits a billing event out-of-band to your webhook endpoint with token counts and request metadata, so your billing pipeline runs independently of the inference path.
Key features
- Federated user and key management: Model each downstream customer as a federated user, then mint, list, and revoke per-customer keys under them. For more information, see Manage federated users and API keys.
- Per-user, per-model rate and usage limits: Configure token or request limits on each user, scoped per model slug. Every key minted under the user inherits them, so rotating credentials doesn’t change what the customer can spend. For more information, see Rate and usage limits.
- Billing webhooks: Receive signed per-request token usage events you can pipe into Stripe, Orb, or your own billing system. For more information, see Billing webhooks.
- White-label routing (coming soon): Serve inference traffic from your branded domain so downstream customers never see the Baseten URL. Contact your onboarding engineer for current availability.
Frontier Gateway versus Model APIs
Frontier Gateway and Model APIs are distinct products with separate endpoints. Frontier Gateway management lives under/v1/gateway/ and is gated to Frontier Gateway customers; public Model APIs customers authenticate with their workspace API key and call inference at /v1/chat/completions directly. Use the table below to confirm which product you need.
| Frontier Gateway | Model APIs | |
|---|---|---|
| Who it’s for | AI labs serving their own hosted model to downstream customers | App developers calling a Baseten-hosted open model |
| Authentication | Federated API keys you mint per customer | Your workspace API key |
| Compute | Your Dedicated deployment | Shared Baseten infrastructure |
| Documentation | Frontier Gateway | Model APIs |
Next steps
- Get started: Walk through your first federated user, API key, and inference call.
- Manage federated users and API keys: Upsert users, mint and revoke keys, and soft-delete users.
- Rate and usage limits: Control per-customer usage with per-user, per-model limits.
- Billing webhooks: Meter usage by consuming signed per-request events.