model/model.py file. To recap, the simplest
directory structure for a model is:
model.py file contains a class with particular methods:
model.py
- The __init__method is used to initialize theModelclass, and allows you to read in configuration parameters and other information.
- The loadmethod is where you define the logic for initializing the model. This might include downloading model weights, or loading them onto a GPU.
- The predictmethod is where you define the logic for inference.
init
As mentioned above, the__init__ method is used to initialize the Model class, and allows you to
read in configuration parameters and runtime information.
The simplest signature for __init__ is:
model.py
model.py
- config: A dictionary containing the config.yaml for the model.
- data_dir: A string containing the path to the data directory for the model.
- secrets: A dictionary containing the secrets for the model. Note that at runtime, these will be populated with the actual values as stored on Baseten.
- environment: A string containing the environment for the model, if the model has been deployed to an environment.
model.py
load
Theload method is where you define the logic for initializing the model. As
mentioned before, this might include downloading model weights or loading them
onto the GPU.
load, unlike the other method mentioned, does not accept any parameters:
model.py
load has
completed successfully. Note that there is a timeout of 30 minutes for this, after which,
if load has not completed, the deployment will be marked as failed.
predict
Thepredict method is where you define the logic for performing inference.
The simplest signature for predict is:
model.py
predict must be JSON-serializable, so it can be:
- dict
- list
- str
Pydantic object.
model.py
predict:
model.py
Streaming
In addition to supporting a single request/response cycle, Truss also supports streaming. See the Streaming guide for more information.Async vs. Sync
Note that thepredict method is synchronous by default. However, if your model inference
depends on APIs require asyncio, predict can also be written as a coroutine.
model.py
If you are using 
asyncio in your predict method, be sure not to perform any blocking
operations, such as a synchronous file download. This can result in degraded performance.