> ## Documentation Index
> Fetch the complete documentation index at: https://docs.baseten.co/llms.txt
> Use this file to discover all available pages before exploring further.

# HTTP endpoints

> Expose server HTTP endpoints from custom model code.

Custom Truss models normally serve `POST /predict` with arbitrary JSON. If you want your deployment to also support additional HTTP routes, define the matching methods on your `Model` class.

This page covers the built-in HTTP routes available from standard Truss model code. If you deploy a custom Docker container, Baseten can forward requests to any route exposed by the underlying server. See [Custom Docker containers](/development/model/custom-server).

Use these methods when you want custom Python logic but still want clients to call your model through the server's built-in HTTP endpoints.

## Which method to implement

| Method             | Endpoint               | Use it for                                                    |
| ------------------ | ---------------------- | ------------------------------------------------------------- |
| `chat_completions` | `/v1/chat/completions` | Chat-style payloads with a `messages` array.                  |
| `completions`      | `/v1/completions`      | Prompt-style payloads with a `prompt` field.                  |
| `embeddings`       | `/v1/embeddings`       | Embedding requests from text or token inputs.                 |
| `messages`         | `/v1/messages`         | Server-specific message payloads exposed by your deployment.  |
| `responses`        | `/v1/responses`        | Server-specific response payloads exposed by your deployment. |

You can implement any subset of these methods, depending on the interface you want to expose.

## API families

| Endpoint               | Family                        |
| ---------------------- | ----------------------------- |
| `/v1/chat/completions` | OpenAI-style chat completions |
| `/v1/completions`      | OpenAI-style text completions |
| `/v1/embeddings`       | OpenAI-style embeddings       |
| `/v1/responses`        | OpenAI-style responses        |
| `/v1/messages`         | Anthropic-style messages      |

This page uses HTTP endpoints as the umbrella term because Truss can expose endpoints from more than one API family.

## chat\_completions

Implement `chat_completions` when your model should accept chat requests.

```python model/model.py theme={"system"}
from typing import Any, Dict

class Model:
    def __init__(self, **kwargs):
        pass

    def load(self):
        pass

    async def predict(self, model_input: Dict[str, Any]):
        return {"output": model_input}

    async def chat_completions(self, model_input: Dict[str, Any], request):
        # Reuse your main inference path so /predict and /v1/chat/completions stay aligned.
        return await self.predict(model_input)
```

The request body follows the chat schema, so `model_input` typically includes fields like:

* `messages`
* `model`
* `stream`
* sampling parameters such as `temperature` and `max_tokens`

If you already have a `predict` method that handles the same payload shape, `chat_completions` can simply delegate to it.

## completions

Implement `completions` when your model should accept prompt-style completion requests.

```python model/model.py theme={"system"}
from typing import Any, Dict

class Model:
    def __init__(self, **kwargs):
        pass

    def load(self):
        pass

    async def completions(self, model_input: Dict[str, Any], request):
        prompt = model_input["prompt"]
        return {
            "id": "cmpl-example",
            "object": "text_completion",
            "choices": [
                {
                    "index": 0,
                    "text": f"You sent: {prompt}",
                    "finish_reason": "stop",
                }
            ],
        }
```

Use `completions` for workloads such as autocomplete, prompt continuation, or fine-tuned models that are designed to extend text instead of following chat-style instructions.

## embeddings, messages, and responses

Implement `embeddings`, `messages`, or `responses` when your deployment should expose those HTTP endpoints from custom model code.

```python model/model.py theme={"system"}
from typing import Any, Dict

class Model:
    def __init__(self, **kwargs):
        pass

    def load(self):
        pass

    def embeddings(self, model_input: Dict[str, Any], request):
        return {"output": "embeddings"}

    def messages(self, model_input: Dict[str, Any], request):
        return {"output": "messages"}

    def responses(self, model_input: Dict[str, Any], request):
        return {"output": "responses"}
```

These methods are forwarded directly to the matching `/v1/*` route, so your implementation can return whatever JSON shape that endpoint expects.

`messages` maps to the Anthropic-style `/v1/messages` route. `embeddings` and `responses` map to OpenAI-style `/v1/embeddings` and `/v1/responses` routes.

## Request and response expectations

* These methods receive the parsed JSON payload as `model_input`.
* If you include a second argument annotated as `fastapi.Request`, you can inspect disconnects or request metadata just like in `predict`. See [Request handling](/development/model/requests).
* Return JSON that matches the endpoint you expose. Baseten does not automatically convert an arbitrary `predict` response into a different response object for custom model code.

## Endpoint paths

When these methods are defined, your deployment can serve the matching HTTP routes in addition to `/predict`.

```text theme={"system"}
/environments/{env}/sync/v1/chat/completions
/environments/{env}/sync/v1/completions
/environments/{env}/sync/v1/embeddings
/environments/{env}/sync/v1/messages
/environments/{env}/sync/v1/responses
```

For production, replace `{env}` with `production`. For development deployments, use `development`.

## Related pages

* [Custom model code](/development/model/custom-model-code)
* [Implementation](/development/model/implementation)
* [Streaming output](/development/model/streaming)
* [Request handling](/development/model/requests)
