Skip to main content
Baseten Delivery Network (BDN) reduces cold start times by mirroring your model weights to Baseten’s infrastructure and caching them close to your pods. Instead of downloading hundreds of gigabytes from Hugging Face, S3, or GCS on every scale-up, BDN mirrors weights once and serves them from multi-tier caches. Configure BDN using the weights key in your config. This works with both Model class deployments and custom Docker images.

Quick start

Add a weights section to your config.yaml:
config.yaml
weights:
  - source: "hf://meta-llama/Llama-3.1-8B@main"
    mount_location: "/models/llama"
    allow_patterns: ["*.safetensors", "config.json"]
    ignore_patterns: ["*.md", "*.txt"]
FieldDescription
sourceWhere to fetch weights from. Supports Hugging Face, S3, GCS, and R2.
mount_locationAbsolute path where weights appear in your container.
allow_patternsOptional. Only download files matching these patterns. Useful for skipping large files you don’t need. See filtering files.
ignore_patternsOptional. Exclude files matching these patterns. Useful for skipping documentation or unused formats.
For private or gated models, add auth_secret_name to reference a Baseten secret with your credentials.

Accessing weights in your model

When your model starts, weights are already downloaded and available at your mount_location. The directory structure from the source is preserved:
/models/llama/                           # Your mount_location
├── config.json
├── model-00001-of-00004.safetensors
├── model-00002-of-00004.safetensors
├── ...
├── model.safetensors.index.json
├── tokenizer.json
├── tokenizer_config.json
└── original/                            # Subfolders are preserved
    ├── consolidated.00.pth
    └── params.json
Load weights directly from this path in your load() method. No download code needed:
model.py
from transformers import AutoModelForCausalLM

class Model:
    def load(self):
        # Weights are already available at mount_location
        self._model = AutoModelForCausalLM.from_pretrained(
            "/models/llama",
            torch_dtype=torch.float16,
            device_map="auto"
        )
The mount is read-only. Weights are fetched during truss push and cached, so cold starts only read from local or nearby caches.

Configuration reference

weights

A list of weight sources to mount into your model container.
config.yaml
weights:
  - source: "hf://meta-llama/Llama-3.1-8B@main"
    mount_location: "/models/llama"
    auth_secret_name: "hf_access_token"
    allow_patterns: ["*.safetensors", "config.json"]
    ignore_patterns: ["*.md", "*.txt"]
source
string
required
URI specifying where to fetch weights from. Supported schemes:
  • hf://: Hugging Face Hub.
  • s3://: AWS S3.
  • gs://: Google Cloud Storage.
  • r2://: Cloudflare R2.
For Hugging Face sources, specify a revision using @revision suffix (branch, tag, or commit SHA).
mount_location
string
required
Absolute path where weights will be mounted in your container. Must start with /.
mount_location: "/models/llama"  # Correct
mount_location: "models/llama"   # Wrong - not absolute
auth_secret_name
string
Name of a Baseten secret containing credentials for accessing private weight sources. See Source types and authentication for the expected secret format for each source type.
allow_patterns
string[]
File patterns to include. Uses Unix shell-style wildcards. Only matching files will be downloaded.
allow_patterns:
  - "*.safetensors"
  - "config.json"
  - "tokenizer.*"
ignore_patterns
string[]
File patterns to exclude. Uses Unix shell-style wildcards. Matching files will be skipped.
ignore_patterns:
  - "*.md"
  - "*.txt"
  - "*.bin"  # Skip PyTorch .bin files if using safetensors

Source types and authentication

For private weight sources, create a Baseten secret with the appropriate credentials. Manage secrets in your Baseten settings.

Hugging Face

Download weights from Hugging Face Hub repositories.
config.yaml
weights:
  - source: "hf://meta-llama/Llama-3.1-8B@main"
    mount_location: "/models/llama"
    auth_secret_name: "hf_access_token"  # Required for private/gated repos
    allow_patterns: ["*.safetensors", "config.json"]
Format: hf://owner/repo@revision
  • owner/repo: The Hugging Face repository.
  • @revision: Branch, tag, or commit SHA.
Revision pinning: When you use a branch name like @main, Baseten resolves it to the specific commit SHA at deploy time and mirrors those exact files. Your deployment stays pinned to that version. Subsequent scale-ups won’t pick up new commits. To update to newer weights, push a new deployment.
Authentication: Hugging Face API token (plain text)
Secret NameSecret Value
hf_access_tokenhf_xxxxxxxxxxxxxxxx...
Get your token from Hugging Face settings.

AWS S3

Download weights from an S3 bucket.
config.yaml
weights:
  - source: "s3://my-bucket/models/custom-weights"
    mount_location: "/models/custom"
    auth_secret_name: "aws_credentials"
Format: s3://bucket/path Authentication: JSON with AWS credentials
Secret NameSecret Value
aws_credentials{"access_key_id": "AKIA...", "secret_access_key": "...", "region": "us-west-2"}
The region field is required. Optionally include session_token for temporary credentials.

Google Cloud Storage

Download weights from a GCS bucket.
config.yaml
weights:
  - source: "gs://my-bucket/models/weights"
    mount_location: "/models/gcs-weights"
    auth_secret_name: "gcp_service_account"
Format: gs://bucket/path Authentication: GCP service account JSON key
Secret NameSecret Value
gcp_service_account{"type": "service_account", "project_id": "...", ...}
Download from GCP Console under IAM & Admin > Service Accounts.

Cloudflare R2

Download weights from a Cloudflare R2 bucket.
config.yaml
weights:
  - source: "r2://abc123def.my-bucket/models/weights"
    mount_location: "/models/r2-weights"
    auth_secret_name: "r2_credentials"
Format: r2://account_id.bucket/path
  • account_id: Your Cloudflare account ID.
  • bucket: R2 bucket name, separated from account_id by a period.
  • path: Path prefix within the bucket.
Authentication: JSON with R2 API credentials
Secret NameSecret Value
r2_credentials{"access_key_id": "...", "secret_access_key": "..."}
Get your R2 API tokens from the Cloudflare dashboard under R2 > Manage R2 API Tokens.

Migration from model_cache

model_cache is deprecated. Migrate to weights for faster cold starts through multi-tier caching.

Automated migration with truss migrate

The truss migrate CLI command automatically converts model_cache configurations:
# Run in your Truss directory
truss migrate

# Or specify a directory
truss migrate /path/to/truss
The command will:
  1. Show a colorized diff of the proposed changes.
  2. Prompt for confirmation before applying.
  3. Create a backup of your original config.yaml.
  4. Warn about any model.py path changes needed.

Manual migration reference

From model_cache to weights:
model_cacheweights
repo_id: "owner/repo"source: "hf://owner/repo@rev"
revision: "main"Included in source URI as @main
kind: "s3"Prefix: s3://bucket/path
kind: "gcs"Prefix: gs://bucket/path
kind: "r2"Prefix: r2://account_id.bucket/path
volume_folder: "name"mount_location: "/app/model_cache/name"
runtime_secret_nameauth_secret_name
allow_patternsallow_patterns (same)
ignore_patternsignore_patterns (same)
Example migration:
config.yaml
weights:
  - source: "hf://meta-llama/Llama-3.1-8B@main"
    mount_location: "/app/model_cache/llama"
    allow_patterns:
      - "*.safetensors"
      - "config.json"
    auth_secret_name: hf_access_token

Chains migration

For Truss Chains, update Assets.cached to Assets.weights in your Python code:
import truss_chains as chains
from truss.base import truss_config

class MyChainlet(chains.ChainletBase):
    remote_config = chains.RemoteConfig(
        assets=chains.Assets(
            weights=[
                truss_config.WeightsSource(
                    source="hf://meta-llama/Llama-3.1-8B@main",
                    mount_location="/app/model_cache/llama",
                    auth_secret_name="hf_access_token",
                    allow_patterns=["*.safetensors", "config.json"],
                )
            ],
            secret_keys=["hf_access_token"],
        ),
    )
Key changes:
  • ModelRepoWeightsSource.
  • repo_id + revisionsource URI with @revision suffix.
  • volume_foldermount_location (must be absolute path).
  • runtime_secret_nameauth_secret_name.
  • Remove use_volume and kind (inferred from URI scheme).

Custom server migration

If you’re using a custom server with model_cache, you’ll need to make additional changes when migrating to weights:
  1. Remove truss-transfer-cli from your start_command. With weights, files are pre-mounted before your container starts.
  2. Update file paths from /app/model_cache/{volume_folder} to your new mount_location.
config.yaml
docker_server:
  # No truss-transfer-cli needed - weights are pre-mounted
  start_command: text-embeddings-router --port 7997
    --model-id /models/jina --max-client-batch-size 128
weights:
  - source: "hf://jinaai/jina-embeddings-v2-base-code@516f4baf..."
    mount_location: "/models/jina"
    ignore_patterns: ["*.onnx"]

Best practices

Pin to specific commits

Avoid using branch names like @main in production. While Baseten pins to the commit SHA at deploy time, using @main means each new deployment may get different weights, making debugging and rollbacks difficult.
Always pin to a specific commit SHA for reproducible deployments:
# Recommended - reproducible across deploys
weights:
  - source: "hf://meta-llama/Llama-3.1-8B@5206a32e7b8a9f1c..."
    mount_location: "/models/llama"

# Not recommended for production - each new deployment resolves to a different commit
weights:
  - source: "hf://meta-llama/Llama-3.1-8B@main"
    mount_location: "/models/llama"
To find the current commit SHA for a Hugging Face repo:
# Using the Hugging Face CLI
huggingface-cli repo-info meta-llama/Llama-3.1-8B --revision main

Filter files with patterns

Only download what you need to minimize cold start time:
weights:
  - source: "hf://meta-llama/Llama-3.1-8B@main"
    mount_location: "/models/llama"
    allow_patterns:
      - "*.safetensors"    # Model weights
      - "config.json"      # Model config
      - "tokenizer.*"      # Tokenizer files
    ignore_patterns:
      - "*.bin"            # Skip PyTorch format if using safetensors
      - "*.md"             # Skip documentation
      - "*.txt"            # Skip text files

Use absolute mount paths

The mount_location must be an absolute path (starting with /):
# Correct
mount_location: "/models/llama"
mount_location: "/app/model_cache/my-model"

# Wrong - will fail validation
mount_location: "models/llama"
mount_location: "./my-model"

Keep mount locations unique

Each weight source must have a unique mount_location:
# Correct - different paths
weights:
  - source: "hf://meta-llama/Llama-3.1-8B@main"
    mount_location: "/models/llama"
  - source: "hf://sentence-transformers/all-MiniLM-L6-v2@main"
    mount_location: "/models/embeddings"

# Wrong - duplicate paths will fail
weights:
  - source: "hf://model-a@main"
    mount_location: "/models/shared"
  - source: "hf://model-b@main"
    mount_location: "/models/shared"

When weights are re-mirrored

Baseten caches weights based on a hash of their configuration and reuses cached weights when possible to avoid redundant downloads. Deduplication and mutation detection: Baseten deduplicates files based on their etag (a content hash), not just filename, and only re-mirrors files that have been mutated since the last pull. Unchanged files are reused from blob storage, even across deployments. Changes that trigger re-mirroring:
FieldRe-mirrors?Why
source✅ YesDifferent repository, revision, or path
allow_patterns✅ YesDifferent files will be downloaded
ignore_patterns✅ YesDifferent files will be downloaded
Changes that do NOT trigger re-mirroring:
FieldRe-mirrors?Why
auth_secret_name❌ NoCredentials don’t affect which files are mirrored
mount_location❌ NoOnly affects where weights appear in your container
To force a fresh download of weights that haven’t changed, modify the source to point to a specific commit SHA instead of a branch name, or add a trivial change to allow_patterns.

How it works

What happens when you truss push

Your truss push command returns immediately after the deployment is created in Baseten. The mirroring process runs in the background, but your model will not be deployed to the Workload Plane until mirroring completes. This ensures weights are available before your model pod starts.

What happens on cold start

Baseten runs multiple Workload Planes across regions and clusters. Each Workload Plane has its own in-cluster cache for fast weight delivery: When your model pod starts:
  1. The BDN Agent on the node fetches the manifest for your weights.
  2. Weights are downloaded through the In-Cluster Cache (shared across pods in the cluster).
  3. Weights are stored in the Node Cache (part of the BDN Agent, shared across pods on the same node).
  4. Weights are mounted read-only to your model pod.

Key benefits

  • Non-blocking pushtruss push returns immediately; mirroring happens in the background.
  • One-time mirroring → Weights are mirrored to Baseten storage once, not on every cold start.
  • No upstream dependency at runtime → Once mirrored, scale-ups and inference never contact the original source.
  • Multi-tier caching → In-cluster cache prevents redundant downloads; node cache provides instant access for subsequent replicas.
  • Deduplication → Identical weight files are stored once and shared via hardlinks.
  • Parallel downloads → Large models download faster with concurrent chunk fetching.

Next steps