This section covers how to implement the logic for your model. As covered in Build your first model, you define model logic in aDocumentation Index
Fetch the complete documentation index at: https://docs.baseten.co/llms.txt
Use this file to discover all available pages before exploring further.
model/model.py file. The simplest directory structure is:
model.py file must contain a class with these methods:
model.py
__init__initializes theModelclass. Read configuration parameters and other information here.loadinitializes the model. Download model weights or load them onto a GPU here.predictruns inference.
__init__
The __init__ method initializes the Model class. Use it to read configuration parameters and runtime information.
The simplest signature for __init__ is:
model.py
__init__ method to accept the following parameters:
model.py
config: A dictionary containing theconfig.yamlfor the model.data_dir: A string containing the path to the data directory for the model.secrets: A dictionary containing the secrets for the model. At runtime, these are populated with the actual values stored on Baseten.environment: A dictionary containing the environment for the model, if the model has been deployed to an environment.Noneotherwise.
model.py
load
The load method is where you define the logic for initializing the model. This might include downloading model weights or loading them onto the GPU.
Unlike the other methods, load does not accept any parameters:
model.py
load has completed successfully. There is a timeout of 30 minutes for this, after which the deployment is marked as failed if load hasn’t completed.
predict
The predict method is where you define the logic for performing inference.
The simplest signature for predict is:
model.py
predict must be JSON-serializable, so it can be:
dictliststr
model.py
predict:
model.py
Streaming
In addition to supporting a single request/response cycle, Truss also supports streaming. See the Streaming guide for more information.Async vs. sync
Thepredict method is synchronous by default. If your model inference depends on APIs that require asyncio, you can write predict as a coroutine:
model.py