model/model.py
file. To recap, the simplest
directory structure for a model is:
model.py
file contains a class with particular methods:
model.py
- The
__init__
method is used to initialize theModel
class, and allows you to read in configuration parameters and other information. - The
load
method is where you define the logic for initializing the model. This might include downloading model weights, or loading them onto a GPU. - The
predict
method is where you define the logic for inference.
init
As mentioned above, the__init__
method is used to initialize the Model
class, and allows you to
read in configuration parameters and runtime information.
The simplest signature for __init__
is:
model.py
model.py
config
: A dictionary containing the config.yaml for the model.data_dir
: A string containing the path to the data directory for the model.secrets
: A dictionary containing the secrets for the model. Note that at runtime, these will be populated with the actual values as stored on Baseten.environment
: A string containing the environment for the model, if the model has been deployed to an environment.
model.py
load
Theload
method is where you define the logic for initializing the model. As
mentioned before, this might include downloading model weights or loading them
onto the GPU.
load
, unlike the other method mentioned, does not accept any parameters:
model.py
load
has
completed successfully. Note that there is a timeout of 30 minutes for this, after which,
if load
has not completed, the deployment will be marked as failed.
predict
Thepredict
method is where you define the logic for performing inference.
The simplest signature for predict
is:
model.py
predict
must be JSON-serializable, so it can be:
dict
list
str
Pydantic
object.
model.py
predict
:
model.py
Streaming
In addition to supporting a single request/response cycle, Truss also supports streaming. See the Streaming guide for more information.Async vs. Sync
Note that thepredict
method is synchronous by default. However, if your model inference
depends on APIs require asyncio
, predict
can also be written as a coroutine.
model.py
If you are using
asyncio
in your predict
method, be sure not to perform any blocking
operations, such as a synchronous file download. This can result in degraded performance.