Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.baseten.co/llms.txt

Use this file to discover all available pages before exploring further.

You have a model deployed on Baseten and want to give your own customers access through your branded domain, with credentials you control and usage you meter. Baseten Frontier Gateway is the managed API gateway that makes this possible. It adds a federated-user resource model, per-user rate and usage limits, billing webhooks, and white-label routing on top of your Dedicated deployment, so your customers call your model through your domain with keys you mint and revoke through the Baseten REST API.
Frontier Gateway is enabled for your workspace by a Baseten engineer. To turn it on, talk to us.

How Frontier Gateway works

Frontier Gateway sits on top of an existing Dedicated deployment. You model each downstream customer as a federated user: a resource that owns the customer’s external customer_id, the set of model slugs they’re allowed to call, and the rate and usage limits enforced on every call. You then mint one or more API keys under each user; those keys are what your customer uses. When your customer calls the gateway with the federated key you issued them, Baseten validates the key, looks up the owning federated user, and enforces the user’s per-model rate and usage limits. Valid requests route to your Dedicated deployment, and the response returns to the caller. For each request, Baseten emits a billing event out-of-band to your webhook endpoint with token counts and request metadata, so your billing pipeline runs independently of the inference path.

Key features

  • Federated user and key management: Model each downstream customer as a federated user, then mint, list, and revoke per-customer keys under them. For more information, see Manage federated users and API keys.
  • Per-user, per-model rate and usage limits: Configure token or request limits on each user, scoped per model slug. Every key minted under the user inherits them, so rotating credentials doesn’t change what the customer can spend. For more information, see Rate and usage limits.
  • Billing webhooks: Receive signed per-request token usage events you can pipe into Stripe, Orb, or your own billing system. For more information, see Billing webhooks.
  • White-label routing (coming soon): Serve inference traffic from your branded domain so downstream customers never see the Baseten URL. Contact your onboarding engineer for current availability.

Frontier Gateway versus Model APIs

Frontier Gateway and Model APIs are distinct products with separate endpoints. Frontier Gateway management lives under /v1/gateway/ and is gated to Frontier Gateway customers; public Model APIs customers authenticate with their workspace API key and call inference at /v1/chat/completions directly. Use the table below to confirm which product you need.
Frontier GatewayModel APIs
Who it’s forAI labs serving their own hosted model to downstream customersApp developers calling a Baseten-hosted open model
AuthenticationFederated API keys you mint per customerYour workspace API key
ComputeYour Dedicated deploymentShared Baseten infrastructure
DocumentationFrontier GatewayModel APIs

Next steps