Model files, such as weights, can be large (often multiple GBs). Truss supports multiple ways to load them efficiently:

  • Public Hugging Face models (default)
  • Bundled directly in Truss

1. Bundling Model Weights in Truss

Store model files inside Truss using the data/ directory.

Example: Stable Diffusion 2.1 Truss structure

data/
    scheduler/
        scheduler_config.json
    text_encoder/
        config.json
        diffusion_pytorch_model.bin
    tokenizer/
        merges.txt
        tokenizer_config.json
        vocab.json
    unet/
        config.json
        diffusion_pytorch_model.bin
    vae/
        config.json
        diffusion_pytorch_model.bin
    model_index.json

Access bundled files in model.py:

class Model:
    def __init__(self, **kwargs):
        self._data_dir = kwargs["data_dir"]

    def load(self):
        self.model = StableDiffusionPipeline.from_pretrained(
            str(self._data_dir),
            revision="fp16",
            torch_dtype=torch.float16,
        ).to("cuda")

Limitation: Large weights increase deployment size, making it slower. Consider cloud storage instead.

2. Loading Private Model Weights from S3

If using private S3 storage, first configure secure authentication.

Step 1: Define AWS Secrets in config.yaml

secrets:
  aws_access_key_id: null
  aws_secret_access_key: null
  aws_region: null # e.g., us-east-1
  aws_bucket: null

Do not store actual credentials here. Add them securely to Baseten secrets manager.

Step 2: Authenticate with AWS in model.py

import boto3

def __init__(self, **kwargs):
    self._config = kwargs.get("config")
    secrets = kwargs.get("secrets")
    self.s3_client = boto3.client(
        "s3",
        aws_access_key_id=secrets["aws_access_key_id"],
        aws_secret_access_key=secrets["aws_secret_access_key"],
        region_name=secrets["aws_region"],
    )
    self.s3_bucket = secrets["aws_bucket"]

Step 3: Deploy

truss push