Implementation (Advanced)
Using request objects / cancellation
Get more control by directly using the request object.
Truss processes client requests by extracting and validating payloads. For advanced use cases, you can access the raw request object to:
- Customize payload deserialization (e.g., binary protocol buffers).
- Handle disconnections & cancel long-running predictions.
You can mix request objects with standard inputs or use requests exclusively for performance optimization.
Using Request Objects in Truss
You can define request objects in preprocess
, predict
, and postprocess
:
Rules for Using Requests
- The request must be type-annotated as
fastapi.Request
. - If only requests are used, Truss skips payload extraction for better performance.
- If both request objects and standard inputs are used:
- Request must be the second argument.
- Preprocessing transforms inputs, but the request object remains unchanged.
postprocess
cannot use only the request—it must receive the model’s output.- If
predict
only uses the request,preprocess
cannot be used.
You must implement request cancellation at the model level, which varies by framework.
Cancelling Requests in Specific Frameworks
TRT-LLM (Polling-Based Cancellation)
For TensorRT-LLM, use response_iterator.cancel()
to terminate streaming requests:
See full example in TensorRT-LLM Docs.
vLLM (Abort API)
For vLLM, use engine.abort()
to stop processing:
See full example in vLLM Docs.
Unsupported Request Features
- Streaming file uploads – Use URLs instead of embedding large data in the request.
- Client-side headers – Most headers are stripped; include necessary metadata in the payload.