ChatCompletions
ChatCompletions endpoint migration guide
Follow this step by step guide for using the OpenAI-compatable bridge endpoint.
Use this endpoint with the OpenAI Python client and the production deployment of a compatable model deployed on Baseten.
https://bridge.baseten.co/{model_id}/v1
Parameters
The name of the model you want to call, such as "mistral-7b"
.
A list of dictionaries containing the chat history to complete.
The maximum number of tokens to generate. Learn more
Set stream=True
to stream model output.
How deterministic to make the model. Learn more
Alternative to temperature. Learn more
Increase or decrease the model’s likelihood to talk about new topics. Learn more
Output
The output will match the ChatCompletions API output format (shown the the right) with two caveats:
- The output
id
is just a UUID. Baseten API requests are stateless, so this ID would not be meaningful. - Values for the
usage
dictionary are not calculated and are set to0
. Baseten charges for compute directly rather than charging for inference by token.
Streaming
You can also stream your model response by passing stream=True
to the client.chat.completions.create()
call. To parse your output, run:
for chunk in response:
print(chunk.choices[0].delta)