ChatCompletions endpoint migration guide

Follow this step by step guide for using the OpenAI-compatable bridge endpoint.

Use this endpoint with the OpenAI Python client and any deployment of a compatable model deployed on Baseten.

https://bridge.baseten.co/v1

Parameters

Special attention should be give to the Baseten-specific arguments that must be passed into the bridge via the extra_body argument.

model
string
required

The name of the model you want to call, such as "mistral-7b".

messages
string
required

A list of dictionaries containing the chat history to complete.

max_tokens
integer
required

The maximum number of tokens to generate. Learn more

stream
boolean

Set stream=True to stream model output.

temperature
float

How deterministic to make the model. Learn more

top_p
float

Alternative to temperature. Learn more

presence_penalty
float

Increase or decrease the model’s likelihood to talk about new topics. Learn more

extra_body
dict
required

Python dictionary that enables extra arguments to be supplied to the request.

extra_body.baseten
dict
required

Baseten-specific parameters that should be passed to the bridge. The arguments should be passed as a dictionary.

extra_body.baseten.model_id
string
required

The string identifier for the target model.

extra_body.baseten.deployment_id
string

The string identifier for the target deployment. When deployment_id is not provided, the production deployment will be used.

Output

The output will match the ChatCompletions API output format (shown the the right) with two caveats:

  1. The output id is just a UUID. Baseten API requests are stateless, so this ID would not be meaningful.
  2. Values for the usage dictionary are not calculated and are set to 0. Baseten charges for compute directly rather than charging for inference by token.

Streaming

You can also stream your model response by passing stream=True to the client.chat.completions.create() call. To parse your output, run:

for chunk in response:
    print(chunk.choices[0].delta)