If an AI lab has given you a federated API key for their model, this guide shows you how to call that model through Baseten Frontier Gateway. The gateway is OpenAI-compatible, so any OpenAI SDK or HTTP client works with two changes: the base URL and the auth header.Documentation Index
Fetch the complete documentation index at: https://docs.baseten.co/llms.txt
Use this file to discover all available pages before exploring further.
Base URL
The default base URL is:https://api.your-lab.com/v1), use that URL instead. Your lab will tell you which URL to use; the request shape is the same.
Authentication
Pass your federated API key in theAuthorization header using the Api-Key scheme, not Bearer:
Authorization: Bearer ..., override it. Federated keys sent as Bearer tokens are rejected.
The key was issued to you by your lab through Baseten’s federated key management. You don’t manage rotation or limits; those are configured on the lab’s side. Treat the key like any other API secret: store it in an environment variable or secret manager, never in source control.
OpenAI SDK example
Make a chat completion request with the federated key your lab gave you. ReplaceYOUR_API_KEY with that key, and your-org/your-model with the model slug your lab gave you.
- Python
- JavaScript
Install the OpenAI SDK:Make a chat completion request:
chat.py
curl example
For raw HTTP usage:Model slug format
Model slugs are formatted asyour-org/your-model (for example, acme/llama-3-70b). Pass the slug as the model parameter on every request. Your lab will tell you which slug or slugs your key has access to; a single key can be authorized for one or more models.
Streaming, structured outputs, and tool calling
The gateway supports streaming, JSON-schema structured outputs, and tool calling through standard OpenAI parameters (stream, response_format, tools). The configuration and usage patterns are identical to any OpenAI-compatible endpoint:
- For more information on streaming responses, see Streaming.
- For more information on JSON-schema and structured generation, see Structured outputs.
- For more information on tool calling and function definitions, see Function calling.
Rate limits
Your federated key has rate and usage limits set by your lab. When a limit is exceeded, the gateway returns429 Too Many Requests. For more information on the limit shape, daily reset behavior, and 429 handling, see Rate and usage limits.