POST /predict with arbitrary JSON. If you want your deployment to also support additional HTTP routes, define the matching methods on your Model class.
This page covers the built-in HTTP routes available from standard Truss model code. If you deploy a custom Docker container, Baseten can forward requests to any route exposed by the underlying server. See Custom Docker containers.
Use these methods when you want custom Python logic but still want clients to call your model through the server’s built-in HTTP endpoints.
Which method to implement
| Method | Endpoint | Use it for |
|---|---|---|
chat_completions | /v1/chat/completions | Chat-style payloads with a messages array. |
completions | /v1/completions | Prompt-style payloads with a prompt field. |
embeddings | /v1/embeddings | Embedding requests from text or token inputs. |
messages | /v1/messages | Server-specific message payloads exposed by your deployment. |
responses | /v1/responses | Server-specific response payloads exposed by your deployment. |
API families
| Endpoint | Family |
|---|---|
/v1/chat/completions | OpenAI-style chat completions |
/v1/completions | OpenAI-style text completions |
/v1/embeddings | OpenAI-style embeddings |
/v1/responses | OpenAI-style responses |
/v1/messages | Anthropic-style messages |
chat_completions
Implementchat_completions when your model should accept chat requests.
model/model.py
model_input typically includes fields like:
messagesmodelstream- sampling parameters such as
temperatureandmax_tokens
predict method that handles the same payload shape, chat_completions can simply delegate to it.
completions
Implementcompletions when your model should accept prompt-style completion requests.
model/model.py
completions for workloads such as autocomplete, prompt continuation, or fine-tuned models that are designed to extend text instead of following chat-style instructions.
embeddings, messages, and responses
Implementembeddings, messages, or responses when your deployment should expose those HTTP endpoints from custom model code.
model/model.py
/v1/* route, so your implementation can return whatever JSON shape that endpoint expects.
messages maps to the Anthropic-style /v1/messages route. embeddings and responses map to OpenAI-style /v1/embeddings and /v1/responses routes.
Request and response expectations
- These methods receive the parsed JSON payload as
model_input. - If you include a second argument annotated as
fastapi.Request, you can inspect disconnects or request metadata just like inpredict. See Request handling. - Return JSON that matches the endpoint you expose. Baseten does not automatically convert an arbitrary
predictresponse into a different response object for custom model code.
Endpoint paths
When these methods are defined, your deployment can serve the matching HTTP routes in addition to/predict.
{env} with production. For development deployments, use development.