> ## Documentation Index > Fetch the complete documentation index at: https://docs.baseten.co/llms.txt > Use this file to discover all available pages before exploring further. # Call your model > Run inference on deployed models This page covers calling self-deployed models with your workspace API key. For hosted open-source models with no deployment step, see [Model APIs](/inference/model-apis/overview). Once deployed, your model is accessible through an [API endpoint](/reference/inference-api/overview). To make an inference request, you'll need: * **Model ID**: Found in the Baseten dashboard or returned when you deploy. * **[API key](/organization/api-keys)**: Authenticates your requests. * **JSON-serializable model input**: The data your model expects. The model ID (and the deployment ID, when you need to target a specific deployment) comes from the model's page URL in your workspace: Anatomy of the deployment page URL. In app.baseten.co/models/abc123/deployments/def456, abc123 is the model ID and def456 is the deployment ID.

Anatomy of the deployment page URL. In app.baseten.co/models/abc123/deployments/def456, abc123 is the model ID and def456 is the deployment ID.

We recommend server-side calls to your model. Client-side code may expose your Baseten API key. Dedicated deployment endpoints don't currently include CORS response headers, so browser-based calls may be blocked. ## Authentication The predict endpoint lives on your model's own subdomain: Anatomy of the model API endpoint. In https://model-abc123.api.baseten.co/environments/production/predict, abc123 is the model ID and production is the environment that serves the request.

Anatomy of the model API endpoint. In https://model-abc123.api.baseten.co/environments/production/predict, abc123 is the model ID and production is the environment that serves the request.

Include your API key in the `Authorization` header: ```bash Request theme={"system"} curl -X POST https://model-YOUR_MODEL_ID.api.baseten.co/environments/production/predict \ -H "Authorization: Bearer $BASETEN_API_KEY" \ -H "Content-Type: application/json" \ -d '{"prompt": "Hello, world!"}' ``` In Python with requests: ```python predict.py theme={"system"} import requests import os api_key = os.environ["BASETEN_API_KEY"] model_id = "YOUR_MODEL_ID" response = requests.post( f"https://model-{model_id}.api.baseten.co/environments/production/predict", headers={"Authorization": f"Bearer {api_key}"}, json={"prompt": "Hello, world!"}, ) print(response.json()) ``` Baseten also accepts the legacy `Authorization: Api-Key ` scheme on every endpoint, so existing scripts continue to work: ```bash Request theme={"system"} curl -X POST https://model-YOUR_MODEL_ID.api.baseten.co/environments/production/predict \ -H "Authorization: Api-Key $BASETEN_API_KEY" \ -H "Content-Type: application/json" \ -d '{"prompt": "Hello, world!"}' ``` ## Predict API endpoints Baseten provides multiple endpoints for different inference modes: * [`/predict`](/reference/inference-api/overview#predict-endpoints): Standard synchronous inference. * [`/async_predict`](/reference/inference-api/overview#predict-endpoints): Asynchronous inference for long-running tasks. Endpoints are available for environments and all deployments. See the [API reference](/reference/inference-api/overview) for details. ## Sync API endpoints Custom servers support both `predict` endpoints and a special `sync` endpoint. Use the `sync` endpoint to call different routes in your custom server: ```text URL theme={"system"} https://model-{model-id}.api.baseten.co/environments/{production}/sync/{route} ``` These examples show how the sync endpoint maps to the custom server's routes: * `https://model-{model_id}.../sync/health` -> `/health` * `https://model-{model_id}.../sync/items` -> `/items` * `https://model-{model_id}.../sync/items/123` -> `/items/123` ## OpenAI SDK When you deploy a model with Engine-Builder, you'll get an OpenAI-compatible server. If you already use one of the OpenAI SDKs, update the base URL to your Baseten model URL and include your Baseten API key: ```python openai_client.py theme={"system"} import os from openai import OpenAI model_id = "abcdef" # TODO: replace with your model id api_key = os.environ.get("BASETEN_API_KEY") model_url = f"https://model-{model_id}.api.baseten.co/environments/production/sync/v1" client = OpenAI( base_url=model_url, api_key=api_key, ) stream = client.chat.completions.create( model="Qwen/Qwen2.5-3B-Instruct", # must match --served-model-name in the deployment messages=[ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the capital of France?"} ], stream=True, ) for chunk in stream: if chunk.choices[0].delta.content is not None: print(chunk.choices[0].delta.content, end="") ``` ## External LLM gateways Any LLM gateway that speaks the OpenAI protocol, such as LiteLLM or OpenRouter, can route traffic to a Baseten deployment. Configure the gateway with three values: * **Base URL**: `https://model-{model_id}.api.baseten.co/environments/production/sync/v1`, using the model ID for your deployment. Choose **API endpoint** on the model page in the Baseten dashboard to copy the full URL. * **Model name**: The value of `--served-model-name` from your deployment's `start_command`. See the [vLLM example](/examples/vllm) for where this is set. When a single gateway routes to several deployments, use an `org/model` naming convention (for example, `acme/llama-3-70b`) to keep routing unambiguous. * **API key**: A [Baseten API key](/organization/api-keys) with access to the deployment. The gateway sends requests to `{base_url}/chat/completions` with `model` set to the served model name and an `Authorization: Bearer ` header. ## Alternative invocation methods * **Baseten CLI**: [`baseten model predict`](/reference/cli/baseten/model) * **Model Dashboard**: "Playground" button in the Baseten UI