Truss processes client requests by extracting and validating payloads. For advanced use cases, you can access the raw request object to:
- Customize payload deserialization (e.g., binary protocol buffers).
- Handle disconnections & cancel long-running predictions.
You can mix request objects with standard inputs or use requests exclusively for performance optimization.
Using Request Objects in Truss
You can define request objects in preprocess
, predict
, and postprocess
:
import fastapi
class Model:
def preprocess(self, request: fastapi.Request):
...
def predict(self, inputs, request: fastapi.Request):
...
def postprocess(self, inputs, request: fastapi.Request):
...
Rules for Using Requests
- The request must be type-annotated as
fastapi.Request
.
- If only requests are used, Truss skips payload extraction for better performance.
- If both request objects and standard inputs are used:
- Request must be the second argument.
- Preprocessing transforms inputs, but the request object remains unchanged.
postprocess
cannot use only the request—it must receive the model’s output.
- If
predict
only uses the request, preprocess
cannot be used.
import fastapi, asyncio, logging
class Model:
async def predict(self, inputs, request: fastapi.Request):
await asyncio.sleep(1)
if await request.is_disconnected():
logging.warning("Cancelled before generation.")
return # Cancel request on the model engine here.
for i in range(5):
await asyncio.sleep(1.0)
logging.warning(i)
yield str(i) # Streaming response
if await request.is_disconnected():
logging.warning("Cancelled during generation.")
return # Cancel request on the model engine here.
You must implement request cancellation at the model level, which varies by framework.
Cancelling Requests in Specific Frameworks
TRT-LLM (Polling-Based Cancellation)
For TensorRT-LLM, use response_iterator.cancel()
to terminate streaming requests:
async for request_output in response_iterator:
if await is_cancelled_fn():
logging.info("Request cancelled. Cancelling Triton request.")
response_iterator.cancel()
return
vLLM (Abort API)
For vLLM, use engine.abort()
to stop processing:
async for request_output in results_generator:
if await request.is_disconnected():
await engine.abort(request_id)
return
Unsupported Request Features
- Streaming file uploads – Use URLs instead of embedding large data in the request.
- Client-side headers – Most headers are stripped; include necessary metadata in the payload.
Responses are generated using AI and may contain mistakes.