- Model ID: Found in the Baseten dashboard or returned when you deploy.
- API key: Authenticates your requests.
- JSON-serializable model input: The data your model expects.
Authentication
Include your API key in theAuthorization header:
Predict API endpoints
Baseten provides multiple endpoints for different inference modes:/predict: Standard synchronous inference./async_predict: Asynchronous inference for long-running tasks.
Sync API endpoints
Custom servers support bothpredict endpoints as well as a special sync endpoint. By using the sync endpoint you are able to call different routes in your custom server.
https://model-{model_id}.../sync/health->/healthhttps://model-{model_id}.../sync/items->/itemshttps://model-{model_id}.../sync/items/123->/items/123
OpenAI SDK
When deploying a model with Engine-Builder, you will get an OpenAI compatible server. If you are already using one of the OpenAI SDKs, you’ll simply need to update the base url to your Baseten model URL and include your Baseten API Key.External LLM gateways
Any LLM gateway that speaks the OpenAI protocol, such as LiteLLM or OpenRouter, can route traffic to a Baseten deployment. Configure the gateway with three values:- Base URL:
https://model-{model_id}.api.baseten.co/environments/production/sync/v1, using the model ID for your deployment. Click API endpoint on the model page in the Baseten dashboard to copy the full URL. - Model name: The value of
--served-model-namefrom your deployment’sstart_command. See the vLLM example for where this is set. When a single gateway routes to several deployments, use anorg/modelnaming convention (for example,acme/llama-3-70b) to keep routing unambiguous. - API key: A Baseten API key with access to the deployment.
{base_url}/chat/completions with model set to the served model name and an Authorization: Api-Key <key> header.
Alternative invocation methods
- Truss CLI:
truss predict - Model Dashboard: “Playground” button in the Baseten UI