- Customize payload deserialization (e.g., binary protocol buffers).
- Handle disconnections & cancel long-running predictions.
You can mix request objects with standard inputs or use requests exclusively for performance optimization.
Using Request Objects in Truss
You can define request objects inpreprocess, predict, and postprocess:
Rules for Using Requests
- The request must be type-annotated as
fastapi.Request. - If only requests are used, Truss skips payload extraction for better performance.
- If both request objects and standard inputs are used:
- Request must be the second argument.
- Preprocessing transforms inputs, but the request object remains unchanged.
postprocesscannot use only the requestβit must receive the modelβs output.- If
predictonly uses the request,preprocesscannot be used.
You must implement request cancellation at the model level, which varies by framework.
Cancelling Requests in Specific Frameworks
TRT-LLM (Polling-Based Cancellation)
For TensorRT-LLM, useresponse_iterator.cancel() to terminate streaming requests:
See full example in TensorRT-LLM Docs.
vLLM (Abort API)
For vLLM, useengine.abort() to stop processing:
See full example in vLLM Docs.
Unsupported Request Features
- Streaming file uploads β Use URLs instead of embedding large data in the request.
- Client-side headers β Most headers are stripped; include necessary metadata in the payload.