model.py file. Truss provides a Model class with three methods (__init__, load, and predict) that give you full control over how your model initializes, loads weights, and handles requests.
Most deployments don’t need custom Python at all. If you’re deploying a supported open-source model, see Your first model for the config-only approach. Use custom model code when you need to:
- Run a model architecture that Baseten’s engines don’t support.
- Add custom preprocessing or postprocessing around inference.
- Combine multiple models or libraries in a single endpoint.
Prerequisites
Install Truss:- uv (recommended)
- pip (macOS/Linux)
- pip (Windows)
Initialize your model
Create a new Truss project withtruss init.
config.yaml: Configuration for dependencies, resources, and deployment settings.model/model.py: Your model code.packages/: Optional local Python packages.data/: Optional data files bundled with your model.
config.yaml
Theconfig.yaml file configures dependencies, resources, and other settings. Here’s the default:
config.yaml
requirements: Python packages installed at build time (pip format).resources: CPU, memory, and GPU allocation.secrets: Secret names your model needs at runtime, such as HuggingFace API keys.
model.py
Themodel.py file defines a Model class with three methods:
__init__: Runs when the class is created. Initialize variables and store configuration here.load: Runs once at startup, before any requests. Load model weights, tokenizers, and other heavy resources here. Separating this from__init__keeps expensive operations out of the request path.predict: Runs on every API request. Process input, run inference, and return the response.
Deploy your model
Deploy withtruss push --watch.
Invoke your model
After deployment, call your model at the invocation URL:Example: text classification
To see theModel class in action, deploy a text classification model from HuggingFace using the transformers library.
config.yaml
Addtransformers and torch as dependencies:
config.yaml
model.py
Load the classification pipeline inload and run it in predict:
model.py
Deploy and call
Deploy withtruss push --watch, then call the endpoint:
Next steps
- Configuration: Full reference for
config.yamloptions. - Implementation: Advanced model patterns including streaming, async, and custom health checks.
- Your first model: Deploy a model with just a config file, no custom Python needed.