Enable real-time, streaming, bidirectional communication using WebSockets for Truss models and Chainlets.
websocket
method handles all processing and input/output communication goes through the WebSocket object (not arguments and return values). There are no separate preprocess
, predict
, and postprocess
methods anymore, but you can still implement load
.
predict
method with a websocket
method to your Truss in model/model.py
. For example:runtime.transport.kind=websocket
in config.yaml
:websocket.close()
predict
, preprocess
, or postprocess
.websocket
method has already accepted the connection, so you must not call websocket.accept()
on it. You may close the connection though at the end of your processing. If you don’t close it explicitly, it will be closed after exiting your websocket
method.websocat
(get it), you can call the model like this:
wss://model-{MODEL_ID}.api.baseten.co/environments/{ENVIRONMENT_NAME}/websocket
.wss://model-{MODEL_ID}.api.baseten.co/deployment/{DEPLOYMENT_NAME}/websocket
.WebSocketProtocol
. All processing happens in the run_remote
method as usual. But inputs as well as outputs (or “return values”) are sent through the WebSocket object using async send_{X}
and receive_{x}
methods (there are variants for text
, bytes
and json)
.
WebSocketProtocol
(it is essentially the same as fastapi.Websocket
, but you cannot accept the connection, because inside the Chainlet, the connection will be already accepted).run_remote()
when using WebSockets.None
(if you return data to the client, send it through the WebSocket itself).runtime.transport.kind
.websocat
(get it), you can call the chain like this:
maxConcurrency - 1
concurrent WebSocket connections, at which point the total number of replicas will be incremented, until the maxReplica
setting is hit.
Scale-down occurs when the number of replicas is greater than minReplica
, and there are replicas with 0 concurrent connections. At this point, we begin scaling down idle replicas one-by-one.
Some other scheduling factors to consider when using WebSockets: