Truss processes client requests by extracting and validating payloads. For advanced use cases, you can access the raw request object to:Documentation Index
Fetch the complete documentation index at: https://docs.baseten.co/llms.txt
Use this file to discover all available pages before exploring further.
- Customize payload deserialization (for example, binary protocol buffers).
- Handle disconnections and cancel long-running predictions.
Using request objects in Truss
You can define request objects inpreprocess, predict, and postprocess:
Rules for using requests
- The request must be type-annotated as
fastapi.Request. - If only requests are used, Truss skips payload extraction for better performance.
- If both request objects and standard inputs are used:
- Request must be the second argument.
- Preprocessing transforms inputs, but the request object remains unchanged.
postprocesscan’t use only the request. It must receive the model’s output.- If
predictonly uses the request,preprocesscannot be used.
Cancelling requests in specific frameworks
TRT-LLM (polling-based cancellation)
For TensorRT-LLM, useresponse_iterator.cancel() to terminate streaming requests:
See full example in TensorRT-LLM Docs.
vLLM (abort API)
For vLLM, useengine.abort() to stop processing:
See full example in vLLM Docs.
Unsupported request features
- Streaming file uploads: Use URLs instead of embedding large data in the request.
- Client-side headers: Most headers are stripped; include necessary metadata in the payload.