Engine control in Python
Use model.py
to customize engine behavior
When you create a new Truss with truss init
, it creates two files: config.yaml
and model/model.py
. While you configure the Engine Builder in config.yaml
, you may use model/model.py
to access and control the engine object during inference.
You have two options:
- Delete the
model/model.py
file and your TensorRT-LLM engine will run according to its base spec. - Update the code to support TensorRT-LLM.
You must either update model/model.py
to pass trt_llm
as an argument to the __init__
method OR delete the file. Otherwise you will get an error on deployment as the default model/model.py
file is not written for TensorRT-LLM.
The engine
object is a property of the trt_llm
argument and must be initialized in __init__
to be accessed in load()
(which runs once on server start-up) and predict()
(which runs for each request handled by the server).
This example applies a chat template with the Llama 3.1 8B tokenizer to the model prompt: