Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.baseten.co/llms.txt

Use this file to discover all available pages before exploring further.

By the end of this guide, you’ll have created a federated user for one of your downstream customers, minted an API key bound to that user, and called your Dedicated deployment through Baseten Frontier Gateway with the key. From here, you can configure additional rate and usage limits, set up billing webhooks, and explore the full federated lifecycle.

Prerequisites

This guide assumes you’ve finished managed onboarding: your workspace is provisioned for federated keys, and your webhook signing secret is in place. If you haven’t started yet, talk to us. The /v1/gateway/ endpoints used here return 403 to workspaces that aren’t onboarded.

Step 1: Create a federated user

A federated user is the resource you create per downstream customer. The user owns the customer’s customer_id, the model slugs they’re allowed to call, and the rate and usage limits enforced on every call. API keys are minted under the user in step 2. Create a user with POST /v1/gateway/users. The request takes a customer_id you choose (a stable identifier from your own user system) and a non-empty list of model configurations. Each entry pairs a model slug with optional rate and usage limits.
curl --request POST \
  --url https://api.baseten.co/v1/gateway/users \
  --header "Authorization: Api-Key $BASETEN_API_KEY" \
  --header "Content-Type: application/json" \
  --data '{
    "customer_id": "cust_42",
    "models": [
      {
        "slug": "your-org/your-model",
        "rate_limits": [
          { "type": "TOKEN", "unit": "MINUTE", "threshold": 1000000 },
          { "type": "REQUEST", "unit": "MINUTE", "threshold": 100 }
        ],
        "usage_limits": [
          { "type": "TOKEN", "unit": "DAY", "threshold": 10000000 }
        ]
      }
    ]
  }'
The response is the new federated user, including the internal id you’ll use as the path parameter when minting keys:
{
  "id": "abc123hash",
  "customer_id": "cust_42",
  "models": [
    {
      "slug": "your-org/your-model",
      "rate_limits": [
        { "type": "TOKEN", "unit": "MINUTE", "threshold": 1000000 },
        { "type": "REQUEST", "unit": "MINUTE", "threshold": 100 }
      ],
      "usage_limits": [
        { "type": "TOKEN", "unit": "DAY", "threshold": 10000000 }
      ]
    }
  ],
  "created_at": "2026-05-05T12:00:00Z"
}
Save the id. You’ll need it in step 2.

Step 2: Mint an API key for the user

Issue a new API key under the federated user with POST /v1/gateway/users/{user_id}/api_keys. The key inherits the user’s full model set by default; rate and usage limits live on the user, not the key.
curl --request POST \
  --url https://api.baseten.co/v1/gateway/users/abc123hash/api_keys \
  --header "Authorization: Api-Key $BASETEN_API_KEY" \
  --header "Content-Type: application/json" \
  --data '{
    "name": "prod-key-1"
  }'
The response contains the plaintext key, returned exactly once:
{
  "api_key": "aBcDeFg.<api-key-secret>",
  "prefix": "aBcDeFg",
  "name": "prod-key-1",
  "models": [
    {
      "slug": "your-org/your-model",
      "rate_limits": [
        { "type": "TOKEN", "unit": "MINUTE", "threshold": 1000000 },
        { "type": "REQUEST", "unit": "MINUTE", "threshold": 100 }
      ],
      "usage_limits": [
        { "type": "TOKEN", "unit": "DAY", "threshold": 10000000 }
      ]
    }
  ]
}
This is the only time the key is returned in plaintext. Save it now: Baseten doesn’t store the secret portion and can’t show it to you again. If you lose it, revoke the key and mint a new one.
The string before the . (here, aBcDeFg) is the prefix. You’ll use the prefix, not the full key, when fetching or revoking the key later.

Step 3: Call your model through the gateway

Use the API key from step 2 to call your model. Frontier Gateway is OpenAI-compatible, so the OpenAI SDK works with the gateway base URL. Replace YOUR_API_KEY in the examples below with the value you saved from the mint-key response.
Install the OpenAI SDK:
pip install openai
Make a chat completion request:
chat.py
from openai import OpenAI

client = OpenAI(
    base_url="https://inference.baseten.co/v1",
    api_key="YOUR_API_KEY",
)

response = client.chat.completions.create(
    model="your-org/your-model",
    messages=[{"role": "user", "content": "Hello, world!"}],
)

print(response.choices[0].message.content)
The base URL is https://inference.baseten.co/v1 today. Once white-label routing is provisioned for your workspace, the base URL becomes the branded domain you configure with your Baseten team, and your downstream customers call your domain instead.

Next steps

  • Manage API keys: Walk the full federated lifecycle: upsert users, mint and revoke keys, and soft-delete users.
  • Rate and usage limits: Tune per-user, per-model token and request thresholds.
  • Billing webhooks: Stream signed per-request usage events into your billing pipeline.