Classically, the truss server wraps the prediction results of your custom model into a response object to be sent back via HTTP to the client.

In advanced use case you might want to create these response objects yourself. Example use cases are:

  • Control over the HTTP status codes.
  • With streaming responses, you can use server-side-events (SSEs).
There is likewise support for using request objects.
import fastapi

class Model:
    def predict(self, inputs) -> fastapi.Response:
        return fastapi.Response(...)

You can return a response from either predict or postprocess and any subclasses from starlette.responses.Response are supported.

If you return a response from predict, you cannot use postprocessing.

SSE / Streaming example


from starlette.responses import StreamingResponse

class Model:
    def predict(self, model_input):
        def event_stream():
            while True:
                time.sleep(1)
                yield ("data: Server Time: "
                       f"{time.strftime('%Y-%m-%d %H:%M:%S')}\n\n"
                
        return StreamingResponse(event_stream(), media_type="text/event-stream")

Response headers are not fully propagated. Include all information in the response itself.