- Customize payload deserialization (e.g., binary protocol buffers).
- Handle disconnections & cancel long-running predictions.
You can mix request objects with standard inputs or use requests exclusively for performance optimization.
Using Request Objects in Truss
You can define request objects inpreprocess
, predict
, and postprocess
:
Rules for Using Requests
- The request must be type-annotated as
fastapi.Request
. - If only requests are used, Truss skips payload extraction for better performance.
- If both request objects and standard inputs are used:
- Request must be the second argument.
- Preprocessing transforms inputs, but the request object remains unchanged.
postprocess
cannot use only the requestβit must receive the modelβs output.- If
predict
only uses the request,preprocess
cannot be used.
You must implement request cancellation at the model level, which varies by framework.
Cancelling Requests in Specific Frameworks
TRT-LLM (Polling-Based Cancellation)
For TensorRT-LLM, useresponse_iterator.cancel()
to terminate streaming requests:
See full example in TensorRT-LLM Docs.
vLLM (Abort API)
For vLLM, useengine.abort()
to stop processing:
See full example in vLLM Docs.
Unsupported Request Features
- Streaming file uploads β Use URLs instead of embedding large data in the request.
- Client-side headers β Most headers are stripped; include necessary metadata in the payload.