Setup
To get started, sign into Baseten with Truss and then install thewebsockets library.
Sign in to Baseten
Install websockets
Hardware
H100_40GB × 1
Engine
vLLM (latest build)
Write the config
Create and move into the project directory:config.yaml and paste the following:
config.yaml
Flags
Thestart_command passes these flags to the engine. Each one controls a runtime or serving behavior:
| Flag | Value | What it does |
|---|---|---|
--compilation-config | {"cudagraph_mode":"PIECEWISE"} | vLLM compilation passes (op fusion, dead-code elimination). |
Deploy
Push the config to Baseten:/models/ in the logs URL (abcd1234 in the example). Use it wherever you see {model_id} in the next section.
Call the model
This preset exposes a WebSocket streaming endpoint at/v1/realtime for low-latency, incremental transcription. See the streaming transcription API reference for the message protocol, Python client example, and supported audio formats.