ChatCompletions
Use this endpoint with the OpenAI Python client and any deployment of a compatable model deployed on Baseten. If you’re serving a vLLM model, this endpoint will support that model out of the box.
This endpoint aims to be at feature partity with OpenAI Chat Completions. If you find an unsupported feature, don’t hesitate to reach out!
Calling the model
https://bridge.baseten.co/v1/direct
Parameters
Parameters supported by the OpenAI ChatCompletions request can be found in the OpenAI documentation.
Special attention should be paid to the model
parameter:
A string formatted as baseten/{model_id}[/{deployment_id}]
. deployment_id
is optional. When deployment_id
is not provided,
the production deployment will be used.
Output
Streaming and non-streaming responses are supported. The vLLM OpenAI Server is a good example of how to serve your model results.
For streaming outputs, data format must comply with the Server-Side-Events (SSE) format. A helpful example for JSON payloads can be found here.
Best Practices
- Pin your
openai
package version in your requirements.txt file. This helps avoid any breaking changes that get introduced through package upgrades - If you must make breaking changes to your truss server (i.e. to introduce a new feature), you should first publish a new model deployment then update your API call on the client side.