Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.baseten.co/llms.txt

Use this file to discover all available pages before exploring further.

By default, Truss wraps prediction results into an HTTP response. For advanced use cases, you can create response objects manually to:
  • Control HTTP status codes.
  • Use server-sent events (SSEs) for streaming responses.
You can return a response from predict or postprocess, but not both.

Returning custom response objects

Any subclass of starlette.responses.Response is supported.
import fastapi

class Model:
    def predict(self, inputs) -> fastapi.Response:
        return fastapi.Response(...)
If predict returns a response, postprocess cannot be used.

Example: Streaming with SSEs

For server-sent events (SSEs), use StreamingResponse:
import time
from starlette.responses import StreamingResponse

class Model:
    def predict(self, model_input):
        def event_stream():
            while True:
                time.sleep(1)
                yield f"data: Server Time: {time.strftime('%Y-%m-%d %H:%M:%S')}\n\n"

        return StreamingResponse(event_stream(), media_type="text/event-stream")

Limitations

  • Response headers aren’t fully propagated: include metadata in the response body.
Also see Using Request Objects for handling raw requests.