from openai import OpenAI
import os

model_id = "abcd1234" # Replace with your model ID

client = OpenAI(
    api_key=os.environ["BASETEN_API_KEY"],
    base_url=f"https://bridge.baseten.co/{model_id}/v1"
)

response = client.chat.completions.create(
  model="mistral-7b",
  messages=[
    {"role": "user", "content": "Who won the world series in 2020?"},
    {"role": "assistant", "content": "The Los Angeles Dodgers won the World Series in 2020."},
    {"role": "user", "content": "Where was it played?"}
  ]
)

print(response.choices[0].message.content)
{
  "choices": [
      {
      "finish_reason": null,
      "index": 0,
      "message": {
          "content": "The 2020 World Series was played in Texas at Globe Life Field in Arlington.",
          "role": "assistant"
      }
      }
  ],
  "created": 1700584611,
  "id": "chatcmpl-eedbac8f-f68d-4769-a1a7-a1c550be8d08",
  "model": "abcd1234",
  "object": "chat.completion",
  "usage": {
      "completion_tokens": 0,
      "prompt_tokens": 0,
      "total_tokens": 0
  }
}

ChatCompletions endpoint migration guide

Follow this step by step guide for using the OpenAI-compatable bridge endpoint.

Use this endpoint with the OpenAI Python client and the production deployment of a compatable model deployed on Baseten.

https://bridge.baseten.co/{model_id}/v1

Parameters

model
string
required

The name of the model you want to call, such as "mistral-7b".

messages
string
required

A list of dictionaries containing the chat history to complete.

max_tokens
integer
required

The maximum number of tokens to generate. Learn more

stream
boolean

Set stream=True to stream model output.

temperature
float

How deterministic to make the model. Learn more

top_p
float

Alternative to temperature. Learn more

presence_penalty
float

Increase or decrease the model’s likelihood to talk about new topics. Learn more

Output

The output will match the ChatCompletions API output format (shown the the right) with two caveats:

  1. The output id is just a UUID. Baseten API requests are stateless, so this ID would not be meaningful.
  2. Values for the usage dictionary are not calculated and are set to 0. Baseten charges for compute directly rather than charging for inference by token.

Streaming

You can also stream your model response by passing stream=True to the client.chat.completions.create() call. To parse your output, run:

for chunk in response:
    print(chunk.choices[0].delta)