ChatCompletions (deprecated)
ChatCompletions endpoint migration guide
Follow this step by step guide for using the OpenAI-compatable bridge endpoint.
Use this endpoint with the OpenAI Python client and any deployment of a compatable model deployed on Baseten.
Parameters
Special attention should be give to the Baseten-specific arguments that must be passed into the bridge via the extra_body
argument.
The name of the model you want to call, such as "mistral-7b"
.
A list of dictionaries containing the chat history to complete.
The maximum number of tokens to generate. Learn more
Set stream=True
to stream model output.
How deterministic to make the model. Learn more
Alternative to temperature. Learn more
Increase or decrease the modelβs likelihood to talk about new topics. Learn more
Python dictionary that enables extra arguments to be supplied to the request.
Baseten-specific parameters that should be passed to the bridge. The arguments should be passed as a dictionary.
The string identifier for the target model.
The string identifier for the target deployment. When deployment_id
is not provided, the production deployment will be used.
Output
The output will match the ChatCompletions API output format (shown the the right) with two caveats:
- The output
id
is just a UUID. Baseten API requests are stateless, so this ID would not be meaningful. - Values for the
usage
dictionary are not calculated and are set to0
. Baseten charges for compute directly rather than charging for inference by token.
Streaming
You can also stream your model response by passing stream=True
to the client.chat.completions.create()
call. To parse your output, run: