Call your model

Once deployed, your model is accessible via an API endpoint. To make an inference request, you’ll need:

Model ID
An API key for your Baseten account.
JSON-serializable model input

Predict API endpoints

Baseten provides multiple endpoints for different inference modes:

/predict – Standard synchronous inference.
/async_predict – Asynchronous inference for long-running tasks.

Endpoints are available for environments and all deployments. See the API reference for details.

Sync API endpoints

Custom servers support both predict endpoints as well as a special sync endpoint. By using the sync endpoint you are able to call different routes in your custom server.

https://model-{model-id}.api.baseten.co/environments/{production}/sync/{route}

Here are a few example for the given example that show how the sync endpoint maps to the custom server’s routes.

https://model-{model_id}.../sync/health -> /health
https://model-{model_id}.../sync/items -> /items
https://model-{model_id}.../sync/items/123 -> /items/123

OpenAI SDK

When deploying a model with Engine Builder, you will get an OpenAI compatible server. If you are already using one of the OpenAI SDKs, you will simply need to update the base url to your Baseten model URL and include your Baseten API Key.

import os
from openai import OpenAI

model_id = "abcdef" # TODO: replace with your model id
api_key = os.environ.get("BASETEN_API_KEY")
model_url = f"https://model-{model_id}.api.baseten.co/environments/production/sync/v1"

client = OpenAI(
    base_url=model_url,
    api_key=api_key,
)

stream = client.chat.completions.create(
    model="baseten",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is the capital of France?"}
    ],
    stream=True,
)

for chunk in stream:
    if chunk.choices[0].delta.content is not None:
        print(chunk.choices[0].delta.content, end="")

Alternative invocation methods

Truss CLI: truss predict
Model Dashboard: “Playground” button in the Baseten UI

Get started

Concepts

Development

Deployment

Inference

Training

Observability

Troubleshooting

Call your model

Predict API endpoints

Sync API endpoints

OpenAI SDK

Alternative invocation methods

Get started

Concepts

Development

Deployment

Inference

Training

Observability

Troubleshooting

​Predict API endpoints

​Sync API endpoints

​OpenAI SDK

​Alternative invocation methods

Predict API endpoints

Sync API endpoints

OpenAI SDK

Alternative invocation methods