> ## Documentation Index
> Fetch the complete documentation index at: https://docs.baseten.co/llms.txt
> Use this file to discover all available pages before exploring further.

# Custom model code

> Deploy a model with custom Python using the Truss Model class.

When you need custom preprocessing, postprocessing, or want to run a model that isn't supported by Baseten's built-in engines, you can write Python code in a `model.py` file. Truss provides a `Model` class with three methods (`__init__`, `load`, and `predict`) that give you full control over how your model initializes, loads weights, and handles requests.

Most deployments don't need custom Python at all. If you're deploying a supported open-source model, see [Your first model](/development/model/build-your-first-model) for the config-only approach. Use custom model code when you need to:

* Run a model architecture that Baseten's engines don't support.
* Add custom preprocessing or postprocessing around inference.
* Combine multiple models or libraries in a single endpoint.

## Prerequisites

You need [uv](https://docs.astral.sh/uv/) installed and a [Baseten account](https://app.baseten.co/signup) with an [API key](https://app.baseten.co/settings/account/api_keys).

## Initialize your model

Create a new Truss project with `truss init`.

```bash theme={"system"}
$ truss init hello-world
? 📦 Name this model: HelloWorld
Truss HelloWorld was created in ~/hello-world
```

This creates a directory with the following structure:

* `config.yaml`: Configuration for dependencies, resources, and deployment settings.
* `model/model.py`: Your model code.
* `packages/`: Optional local Python packages.
* `data/`: Optional data files bundled with your model.

### config.yaml

The `config.yaml` file configures dependencies, resources, and other settings. Here's the default:

```yaml config.yaml theme={"system"}
build_commands: []
environment_variables: {}
external_package_dirs: []
model_metadata: {}
model_name: HelloWorld
python_version: py311
requirements: []
resources:
  accelerator: null
  cpu: '1'
  memory: 2Gi
  use_gpu: false
secrets: {}
system_packages: []
```

The fields you'll use most often:

* `requirements`: Python packages installed at build time (pip format).
* `resources`: CPU, memory, and GPU allocation.
* `secrets`: Secret names your model needs at runtime, such as HuggingFace API keys.

See the [Configuration](/development/model/configuration) page for the full reference.

### model.py

The `model.py` file defines a `Model` class with three methods:

```python theme={"system"}
class Model:
    def __init__(self, **kwargs):
        pass

    def load(self):
        pass

    def predict(self, model_input):
        return model_input
```

* `__init__`: Runs when the class is created. Initialize variables and store configuration here.
* `load`: Runs once at startup, before any requests. Load model weights, tokenizers, and other heavy resources here. Separating this from `__init__` keeps expensive operations out of the request path.
* `predict`: Runs on every API request. Process input, run inference, and return the response.

<Warning>
  `load` and `predict` don't run on the same thread, which matters for GPU workloads where state can be tied to the creating thread (such as CUDA contexts). With sync `predict` and the default `predict_concurrency` of 1, successive `predict` calls often reuse the same worker thread, but Baseten doesn't guarantee it.
</Warning>

## Give your model access to files

Most models need additional files at runtime — weights, tokenizers, configs, or reference datasets. For local files under \~1 GB total, bundle them in your Truss's `data/` directory and access them in `__init__` through `kwargs["data_dir"]`:

```python model.py theme={"system"}
class Model:
    def __init__(self, **kwargs):
        self._data_dir = kwargs["data_dir"]

    def load(self):
        self.tokenizer = AutoTokenizer.from_pretrained(str(self._data_dir))
```

For larger weights or remote sources (Hugging Face, S3, GCS, R2), use the [Baseten Delivery Network (BDN)](/development/model/bdn) instead — it mirrors weights once and serves them from caches close to your pods. For all options, see [Data and storage](/development/model/data-directory).

## Deploy your model

Deploy with `truss push --watch`.

```bash theme={"system"}
$ truss push --watch
```

This packages your code and config, builds a container, and deploys it to Baseten.

## Invoke your model

After deployment, call your model at the invocation URL:

```bash theme={"system"}
$ curl -X POST https://model-{model-id}.api.baseten.co/development/predict \
  -H "Authorization: Api-Key $BASETEN_API_KEY" \
  -d '"some text"'
```

You should see:

```output theme={"system"}
"some text"
```

## Example: text classification

To see the `Model` class in action, deploy a text classification model from HuggingFace using the `transformers` library.

### Update config.yaml

Add `transformers` and `torch` as dependencies:

```yaml config.yaml theme={"system"}
requirements:
  - transformers
  - torch
```

### Update model.py

Load the classification pipeline in `load` and run it in `predict`:

```python model.py theme={"system"}
from transformers import pipeline

class Model:
    def __init__(self, **kwargs):
        pass

    def load(self):
        self._model = pipeline("text-classification")

    def predict(self, model_input):
        return self._model(model_input)
```

### Deploy and call

Deploy with `truss push --watch`, then call the endpoint:

```bash theme={"system"}
$ truss push --watch
```

```bash theme={"system"}
$ curl -X POST https://model-{model-id}.api.baseten.co/development/predict \
  -H "Authorization: Api-Key $BASETEN_API_KEY" \
  -d '"some text"'
```

## Next steps

* [Implementation](/development/model/implementation): Core `Model` lifecycle, method signatures, and sync vs. async `predict` patterns.
* [HTTP endpoints](/development/model/http-endpoints): Add `chat_completions`, `completions`, `embeddings`, `messages`, or `responses` to serve matching `/v1/*` routes from custom model code.
* [Streaming output](/development/model/streaming): Return generated output incrementally instead of waiting for the full response.
* [Custom health checks](/development/model/custom-health-checks): Define readiness and liveness behavior for custom model logic.
* [Configuration](/development/model/configuration): Full reference for `config.yaml` options.
* [Your first model](/development/model/build-your-first-model): Deploy a model with just a config file, no custom Python needed.
