Baseten Frontier Gateway

You have a model deployed on Baseten and want to give your own customers access through your branded domain, with credentials you control and usage you meter. Baseten Frontier Gateway is the managed API gateway that makes this possible. It adds a hierarchical group resource model, per-group rate and usage limits with inheritance, billing webhooks, and white-label routing on top of your Dedicated deployment, so your customers call your model through your domain with keys you mint and revoke through the Baseten REST API.

Frontier Gateway is enabled for your workspace by a Baseten engineer. To turn it on, talk to us.

How Frontier Gateway works

Frontier Gateway sits on top of an existing Dedicated deployment. You model your customers, plans, and projects as a tree of groups. Each group owns an external identifier (metadata.external_entity_id), the set of model slugs it’s allowed to call, and the rate and usage limits enforced on every call. Groups can nest under a parent group, and limits flow down the tree according to the group’s limit_enforcement mode. You then mint one or more API keys under any group; those keys are what your customer uses. Every key inherits the effective config of its group, so rotating credentials never changes what the customer can spend. When a request hits the gateway with one of your federated keys, Baseten validates the key, walks up the owning group’s hierarchy to compute effective limits, and enforces them per model slug. Valid requests route to your Dedicated deployment, and the response returns to the caller. For each request, Baseten emits a signed billing event out-of-band to your webhook endpoint with token counts and request metadata, so your billing pipeline runs independently of the inference path.

Key features

Hierarchical groups: Model your organization however your billing structure fits, whether that’s orgs and projects, plans and customers, or tenants and seats. Groups carry the model set and the limits; keys hang off groups and inherit them. For more information, see Manage groups and API keys.
Two inheritance modes: Pick an enforcement mode per hierarchy. An independent hierarchy lets children override their parents and meters each group’s usage separately; a cascading hierarchy makes a group’s usage count against every ancestor at once. For more information, see Inheritance modes.
Per-group, per-model rate and usage limits: Configure TOKEN or REQUEST limits on each group, scoped per model slug. Every key minted under the group inherits the group’s effective limits.
Billing webhooks: Receive signed per-request token usage events you can pipe into Stripe, Orb, or your own billing system. For more information, see Billing webhooks.
White-label routing (coming soon): Serve inference traffic from your branded domain so downstream customers never see the Baseten URL. Contact your onboarding engineer for current availability.

Frontier Gateway versus Model APIs

Frontier Gateway and Model APIs are distinct products with separate endpoints. Frontier Gateway management lives under /v1/gateway/ and is gated to Frontier Gateway customers; public Model APIs customers authenticate with their workspace API key and call inference at /v1/chat/completions directly. Use the table below to confirm which product you need.

	Frontier Gateway	Model APIs
Who it’s for	AI labs serving their own hosted model to downstream customers	App developers calling a Baseten-hosted open model
Authentication	Federated API keys you mint per group	Your workspace API key
Compute	Your Dedicated deployment	Shared Baseten infrastructure
Documentation	Frontier Gateway	Model APIs

Next steps

Get started: Walk through your first group, API key, and inference call.
Manage groups and API keys: Create groups, build a hierarchy, and mint or revoke keys.
Rate and usage limits: Control per-group, per-model usage and pick an inheritance mode.
Billing webhooks: Meter usage by consuming signed per-request events.

​How Frontier Gateway works

​Key features

​Frontier Gateway versus Model APIs

​Next steps

How Frontier Gateway works

Key features

Frontier Gateway versus Model APIs

Next steps