Custom model code

When you need custom preprocessing, postprocessing, or want to run a model that isn’t supported by Baseten’s built-in engines, you can write Python code in a model.py file. Truss provides a Model class with three methods (__init__, load, and predict) that give you full control over how your model initializes, loads weights, and handles requests. Most deployments don’t need custom Python at all. If you’re deploying a supported open-source model, see Your first model for the config-only approach. Use custom model code when you need to:

Run a model architecture that Baseten’s engines don’t support.
Add custom preprocessing or postprocessing around inference.
Combine multiple models or libraries in a single endpoint.

Prerequisites

Install Truss:

uv (recommended)
pip (macOS/Linux)
pip (Windows)

uv venv && source .venv/bin/activate
uv pip install --upgrade truss

python3 -m venv .venv && source .venv/bin/activate
pip install --upgrade truss

python3 -m venv .venv && .venv\Scripts\activate
pip install --upgrade truss

You also need a Baseten account with an API key.

Initialize your model

Create a new Truss project with truss init.

$ truss init hello-world
? 📦 Name this model: HelloWorld
Truss HelloWorld was created in ~/hello-world

This creates a directory with the following structure:

config.yaml: Configuration for dependencies, resources, and deployment settings.
model/model.py: Your model code.
packages/: Optional local Python packages.
data/: Optional data files bundled with your model.

config.yaml

The config.yaml file configures dependencies, resources, and other settings. Here’s the default:

config.yaml

build_commands: []
environment_variables: {}
external_package_dirs: []
model_metadata: {}
model_name: HelloWorld
python_version: py311
requirements: []
resources:
  accelerator: null
  cpu: '1'
  memory: 2Gi
  use_gpu: false
secrets: {}
system_packages: []

The fields you’ll use most often:

requirements: Python packages installed at build time (pip format).
resources: CPU, memory, and GPU allocation.
secrets: Secret names your model needs at runtime, such as HuggingFace API keys.

See the Configuration page for the full reference.

model.py

The model.py file defines a Model class with three methods:

class Model:
    def __init__(self, **kwargs):
        pass

    def load(self):
        pass

    def predict(self, model_input):
        return model_input

__init__: Runs when the class is created. Initialize variables and store configuration here.
load: Runs once at startup, before any requests. Load model weights, tokenizers, and other heavy resources here. Separating this from __init__ keeps expensive operations out of the request path.
predict: Runs on every API request. Process input, run inference, and return the response.

Deploy your model

Deploy with truss push --watch.

$ truss push --watch

This packages your code and config, builds a container, and deploys it to Baseten.

Invoke your model

After deployment, call your model at the invocation URL:

$ curl -X POST https://model-{model-id}.api.baseten.co/development/predict \
  -H "Authorization: Api-Key $BASETEN_API_KEY" \
  -d '"some text"'

You should see:

"some text"

Example: text classification

To see the Model class in action, deploy a text classification model from HuggingFace using the transformers library.

config.yaml

Add transformers and torch as dependencies:

config.yaml

requirements:
  - transformers
  - torch

model.py

Load the classification pipeline in load and run it in predict:

model.py

from transformers import pipeline

class Model:
    def __init__(self, **kwargs):
        pass

    def load(self):
        self._model = pipeline("text-classification")

    def predict(self, model_input):
        return self._model(model_input)

Deploy and call

Deploy with truss push --watch, then call the endpoint:

$ truss push --watch

$ curl -X POST https://model-{model-id}.api.baseten.co/development/predict \
  -H "Authorization: Api-Key $BASETEN_API_KEY" \
  -d '"some text"'

Next steps

Configuration: Full reference for config.yaml options.
Implementation: Advanced model patterns including streaming, async, and custom health checks.
Your first model: Deploy a model with just a config file, no custom Python needed.

Get started

Concepts

Development

Deployment

Inference

Engines

Training

Organization

Observability

Troubleshooting

Custom model code

Prerequisites

Initialize your model

config.yaml

model.py

Deploy your model

Invoke your model

Example: text classification

config.yaml

model.py

Deploy and call

Next steps

Get started

Concepts

Development

Deployment

Inference

Engines

Training

Organization

Observability

Troubleshooting

​Prerequisites

​Initialize your model

​config.yaml

​model.py

​Deploy your model

​Invoke your model

​Example: text classification

​config.yaml

​model.py

​Deploy and call

​Next steps

Prerequisites

Initialize your model

config.yaml

model.py

Deploy your model

Invoke your model

Example: text classification

config.yaml

model.py

Deploy and call

Next steps