Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.baseten.co/llms.txt

Use this file to discover all available pages before exploring further.

Baseten Delivery Network (BDN) reduces cold start times by mirroring your model weights to Baseten’s infrastructure and caching them close to your pods. Instead of downloading hundreds of gigabytes from Hugging Face, S3, or GCS on every scale-up, BDN mirrors weights once and serves them from multi-tier caches. Configure BDN using the weights key in your config. This works with both Model class deployments and custom Docker images.

Get started

Add weights to a new model

Custom servers

Use with vLLM, SGLang, and more

Migrate

Move from model_cache
BDN mirrors any supported source the same way. If your weights are only on local disk, bundle them with your Truss for small models, or push them to a private Hugging Face repository for large ones.

Quick start

Add a weights section to your config.yaml:
config.yaml
weights:
  - source: "hf://meta-llama/Llama-3.1-8B@main"
    mount_location: "/models/llama"
    allow_patterns: ["*.safetensors", "config.json"]
    ignore_patterns: ["*.md", "*.txt"]
FieldDescription
sourceWhere to fetch weights from. Supports Hugging Face, Baseten Training, S3, GCS, and R2.
mount_locationAbsolute path where weights appear in your container.
allow_patternsOptional. Only download files matching these patterns. Useful for skipping large files you don’t need. See filtering files.
ignore_patternsOptional. Exclude files matching these patterns. Useful for skipping documentation or unused formats.
For private or gated models, add an auth section to reference a Baseten secret with your credentials.

Accessing weights in your model

When your model starts, weights are already downloaded and available at your mount_location. The directory structure from the source is preserved:
/models/llama/                           # Your mount_location
├── config.json
├── model-00001-of-00004.safetensors
├── model-00002-of-00004.safetensors
├── ...
├── model.safetensors.index.json
├── tokenizer.json
├── tokenizer_config.json
└── original/                            # Subfolders are preserved
    ├── consolidated.00.pth
    └── params.json
Load weights directly from this path in your load() method. No download code needed:
model.py
from transformers import AutoModelForCausalLM

class Model:
    def load(self):
        # Weights are already available at mount_location
        self._model = AutoModelForCausalLM.from_pretrained(
            "/models/llama",
            torch_dtype=torch.float16,
            device_map="auto"
        )
The mount is read-only. Weights are fetched during truss push and cached, so cold starts only read from local or nearby caches.

Custom servers

Custom Docker servers like vLLM and SGLang work directly with BDN. BDN pre-mounts files at mount_location before the container starts, so the start_command reads weights without a separate download step.
config.yaml
base_image:
  image: lmsysorg/sglang:v0.5.8.post1
docker_server:
  start_command: python3 -m sglang.launch_server --model-path /models/qwen
    --served-model-name Qwen/Qwen2.5-3B-Instruct --host 0.0.0.0 --port 8000
  readiness_endpoint: /health
  liveness_endpoint: /health
  predict_endpoint: /v1/chat/completions
  server_port: 8000
weights:
  - source: "hf://Qwen/Qwen2.5-3B-Instruct@aa8e72537993ba99e69dfaafa59ed015b17504d1"
    mount_location: "/models/qwen"
For complete worked examples, see Deploy LLMs with SGLang or Deploy LLMs with vLLM.

Configuration reference

weights

A list of weight sources to mount into your model container.
config.yaml
weights:
  - source: "hf://meta-llama/Llama-3.1-8B@main"
    mount_location: "/models/llama"
    auth:
      auth_method: CUSTOM_SECRET
      auth_secret_name: "hf_access_token"
    allow_patterns: ["*.safetensors", "config.json"]
    ignore_patterns: ["*.md", "*.txt"]
source
string
required
URI specifying where to fetch weights from. Supported schemes:
  • hf://: Hugging Face Hub.
  • bt://: Baseten Training.
  • s3://: AWS S3.
  • gs://: Google Cloud Storage.
  • r2://: Cloudflare R2.
For Hugging Face sources, specify a revision using @revision suffix (branch, tag, or commit SHA).
mount_location
string
required
Absolute path where weights will be mounted in your container. Must start with /.
mount_location: "/models/llama"  # Correct
mount_location: "models/llama"   # Wrong - not absolute
auth
object
Authentication configuration for accessing private weight sources. See Source types and authentication for the expected format for each source type.
  • auth_method: The authentication method. Use CUSTOM_SECRET for secret-based auth, AWS_OIDC for AWS OIDC, or GCP_OIDC for GCP OIDC.
  • auth_secret_name: Name of a Baseten secret containing credentials (required for CUSTOM_SECRET).
allow_patterns
string[]
File patterns to include. Uses Unix shell-style wildcards. Only matching files will be downloaded.
allow_patterns:
  - "*.safetensors"
  - "config.json"
  - "tokenizer.*"
Patterns like *.safetensors only match files at the top level. Use **/*.safetensors to match files in subdirectories.
ignore_patterns
string[]
File patterns to exclude. Uses Unix shell-style wildcards. Matching files will be skipped.
ignore_patterns:
  - "*.md"
  - "*.txt"
  - "*.bin"  # Skip PyTorch .bin files if using safetensors

Source types and authentication

For private weight sources, create a Baseten secret with the appropriate credentials. Manage secrets in your Baseten settings.

Hugging Face

Download weights from Hugging Face Hub repositories.
config.yaml
weights:
  - source: "hf://meta-llama/Llama-3.1-8B@main"
    mount_location: "/models/llama"
    auth:
      auth_method: CUSTOM_SECRET
      auth_secret_name: "hf_access_token"  # Required for private/gated repos
    allow_patterns: ["*.safetensors", "config.json"]
Format: hf://owner/repo@revision
  • owner/repo: The Hugging Face repository.
  • @revision: Branch, tag, or commit SHA.
Revision pinning: When you use a branch name like @main, Baseten resolves it to the specific commit SHA at deploy time and mirrors those exact files. Your deployment stays pinned to that version. Subsequent scale-ups won’t pick up new commits. To update to newer weights, push a new deployment.
Authentication: Hugging Face API token (plain text)
Secret NameSecret Value
hf_access_tokenhf_xxxxxxxxxxxxxxxx...
Get your token from Hugging Face settings.

Baseten Training

Load weights from a Baseten Training checkpoint.
config.yaml
weights:
  - source: "bt://my-training-project@job123/checkpoint-1"
    mount_location: "/models/trained"
Format: bt://project[@revision][/checkpoint]
  • project: The name of your Baseten Training project.
  • @revision: Optional. A training job ID or latest. Defaults to latest.
  • /checkpoint: Optional. The checkpoint name within the training job. If omitted, uses the latest checkpoint.
Baseten automatically authenticates with your training project. No auth configuration is required.

AWS S3

Download weights from a private S3 bucket.
If your model is small (a few GB or less), you can also bundle weights directly with your Truss instead of fetching them from a remote source.

Pick an auth method

AWS S3 supports two authentication paths, both first-class:
  • IAM credentials: Use this if you have an AWS access key pair and want the simplest setup. Skip ahead to the quick start.
  • AWS OIDC: Use this if you want short-lived, narrowly scoped tokens and are comfortable configuring an IAM trust policy in your AWS account. See AWS OIDC.

Quick start with IAM credentials

Use this path when you already have an AWS access key pair for an IAM user or role with read access to your bucket.
  1. Create the secret in Baseten. In your secrets settings, add a secret named aws_credentials with this JSON value:
    {
      "aws_access_key_id": "AKIA...",
      "aws_secret_access_key": "...",
      "aws_region": "us-west-2"
    }
    
    Use these exact key names. Common variations like access_key_id (without the aws_ prefix) cause authentication failures.
  2. Reference the secret from your config.yaml:
    config.yaml
    weights:
      - source: "s3://my-bucket/models/custom-weights"
        mount_location: "/models/custom"
        auth:
          auth_method: CUSTOM_SECRET
          auth_secret_name: "aws_credentials"
    
  3. Grant the IAM user the minimum required permissions on the bucket:
    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Action": ["s3:ListBucket"],
          "Resource": "arn:aws:s3:::my-bucket"
        },
        {
          "Effect": "Allow",
          "Action": ["s3:GetObject"],
          "Resource": "arn:aws:s3:::my-bucket/models/custom-weights/*"
        }
      ]
    }
    
    The mirror lists objects under your prefix and downloads each file once. No write permissions are needed.
Push the model. The first deploy mirrors weights to Baseten’s blob storage; subsequent deploys reuse the mirror unless the source or filters change. For the full IAM credentials field reference, including optional fields, see IAM credentials.

AWS OIDC

OIDC provides short-lived, narrowly scoped tokens for secure authentication without managing long-lived credentials.
  1. Configure AWS to trust the Baseten OIDC provider and create an IAM role with S3 permissions.
  2. Add the OIDC configuration to your config.yaml:
config.yaml
weights:
  - source: "s3://my-bucket/models/custom-weights"
    mount_location: "/models/custom"
    auth:
      auth_method: AWS_OIDC
      aws_oidc_role_arn: arn:aws:iam::<account-id>:role/baseten-s3-access
      aws_oidc_region: us-west-2
No secrets needed! The aws_oidc_role_arn and aws_oidc_region are not sensitive and can be committed to your repository.
See the OIDC authentication guide for detailed setup instructions and best practices.

IAM credentials

config.yaml
weights:
  - source: "s3://my-bucket/models/custom-weights"
    mount_location: "/models/custom"
    auth:
      auth_method: CUSTOM_SECRET
      auth_secret_name: "aws_credentials"
Format: s3://bucket/path Authentication: JSON with AWS credentials
FieldRequiredDescription
aws_access_key_idYesAccess key ID for the IAM user or role.
aws_secret_access_keyYesSecret access key paired with the access key ID.
aws_regionNoRegion of the bucket. Defaults to us-east-1.
aws_session_tokenNoSession token for temporary credentials, such as those issued by AWS STS or aws sso.
Example secret value with all fields:
{
  "aws_access_key_id": "AKIA...",
  "aws_secret_access_key": "...",
  "aws_region": "us-west-2",
  "aws_session_token": "..."
}
The required fields must use the exact names aws_access_key_id and aws_secret_access_key. Using access_key_id or secret_access_key (without the aws_ prefix) causes authentication failures.
For the minimum required IAM policy, see the quick start.

Google Cloud Storage

Download weights from a GCS bucket. GCP supports using either service accounts or OIDC for GCS authentication. OIDC provides short-lived, narrowly scoped tokens for secure authentication without managing long-lived credentials.
  1. Configure GCP Workload Identity to trust the Baseten OIDC provider and grant GCS permissions.
  2. Add the OIDC configuration to your config.yaml:
config.yaml
weights:
  - source: "gs://my-bucket/models/weights"
    mount_location: "/models/gcs-weights"
    auth:
      auth_method: GCP_OIDC
      gcp_oidc_service_account: baseten-oidc@my-project.iam.gserviceaccount.com
      gcp_oidc_workload_id_provider: projects/123456789/locations/global/workloadIdentityPools/baseten-pool/providers/baseten-provider
No secrets needed! The service account and workload identity provider are not sensitive and can be committed to your repository.
See the OIDC authentication guide for detailed setup instructions and best practices.

Service account

config.yaml
weights:
  - source: "gs://my-bucket/models/weights"
    mount_location: "/models/gcs-weights"
    auth:
      auth_method: CUSTOM_SECRET
      auth_secret_name: "gcp_service_account"
Format: gs://bucket/path Authentication: GCP service account JSON key
Secret NameSecret Value
gcp_service_account{"type": "service_account", "project_id": "...", ...}
Download from GCP Console under IAM & Admin > Service Accounts.

Cloudflare R2

Download weights from a Cloudflare R2 bucket.
config.yaml
weights:
  - source: "r2://abc123def.my-bucket/models/weights"
    mount_location: "/models/r2-weights"
    auth:
      auth_method: CUSTOM_SECRET
      auth_secret_name: "r2_credentials"
Format: r2://account_id.bucket/path
  • account_id: Your Cloudflare account ID.
  • bucket: R2 bucket name, separated from account_id by a period.
  • path: Path prefix within the bucket.
Authentication: JSON with R2 API credentials
Secret NameSecret Value
r2_credentials{"aws_access_key_id": "...", "aws_secret_access_key": "..."}
Get your R2 API tokens from the Cloudflare dashboard under R2 > Manage R2 API Tokens.

Best practices

Pin to specific commits

Avoid using branch names like @main in production. While Baseten pins to the commit SHA at deploy time, using @main means each new deployment may get different weights, making debugging and rollbacks difficult.
Always pin to a specific commit SHA for reproducible deployments:
# Recommended - reproducible across deploys
weights:
  - source: "hf://meta-llama/Llama-3.1-8B@5206a32e7b8a9f1c..."
    mount_location: "/models/llama"

# Not recommended for production - each new deployment resolves to a different commit
weights:
  - source: "hf://meta-llama/Llama-3.1-8B@main"
    mount_location: "/models/llama"
To find the current commit SHA for a Hugging Face repo:
# Using the Hugging Face CLI
huggingface-cli repo-info meta-llama/Llama-3.1-8B --revision main

Filter files with patterns

Only download what you need to minimize cold start time:
weights:
  - source: "hf://meta-llama/Llama-3.1-8B@main"
    mount_location: "/models/llama"
    allow_patterns:
      - "*.safetensors"    # Model weights
      - "config.json"      # Model config
      - "tokenizer.*"      # Tokenizer files
    ignore_patterns:
      - "*.bin"            # Skip PyTorch format if using safetensors
      - "*.md"             # Skip documentation
      - "*.txt"            # Skip text files
Patterns like *.safetensors only match files at the top level of the source. To match files in subdirectories, use **/*.safetensors.

Use absolute mount paths

The mount_location must be an absolute path (starting with /):
# Correct
mount_location: "/models/llama"
mount_location: "/app/model_cache/my-model"

# Wrong - will fail validation
mount_location: "models/llama"
mount_location: "./my-model"

Keep mount locations unique

Each weight source must have a unique mount_location:
# Correct - different paths
weights:
  - source: "hf://meta-llama/Llama-3.1-8B@main"
    mount_location: "/models/llama"
  - source: "hf://sentence-transformers/all-MiniLM-L6-v2@main"
    mount_location: "/models/embeddings"

# Wrong - duplicate paths will fail
weights:
  - source: "hf://model-a@main"
    mount_location: "/models/shared"
  - source: "hf://model-b@main"
    mount_location: "/models/shared"

When weights are re-mirrored

Baseten caches weights based on a hash of their configuration and reuses cached weights when possible to avoid redundant downloads. Deduplication and mutation detection: Baseten deduplicates files based on their etag (a content hash), not just filename, and only re-mirrors files that have been mutated since the last pull. Unchanged files are reused from blob storage, even across deployments. Changes that trigger re-mirroring:
FieldRe-mirrors?Why
source✅ YesDifferent repository, revision, or path
allow_patterns✅ YesDifferent files will be downloaded
ignore_patterns✅ YesDifferent files will be downloaded
Changes that do NOT trigger re-mirroring:
FieldRe-mirrors?Why
auth❌ NoCredentials don’t affect which files are mirrored
mount_location❌ NoOnly affects where weights appear in your container
To force a fresh download of weights that haven’t changed, modify the source to point to a specific commit SHA instead of a branch name, or add a trivial change to allow_patterns.

How it works

What happens when you truss push

Your truss push command returns immediately after the deployment is created in Baseten. The mirroring process runs in the background, but your model will not be deployed to the Workload Plane until mirroring completes. This ensures weights are available before your model pod starts.

What happens on cold start

Baseten runs multiple Workload Planes across regions and clusters. Each Workload Plane has its own in-cluster cache for fast weight delivery: When your model pod starts:
  1. The BDN Agent on the node fetches the manifest for your weights.
  2. Weights are downloaded through the In-Cluster Cache (shared across pods in the cluster).
  3. Weights are stored in the Node Cache (part of the BDN Agent, shared across pods on the same node).
  4. Weights are mounted read-only to your model pod.

Key benefits

  • Non-blocking pushtruss push returns immediately; mirroring happens in the background.
  • One-time mirroring → Weights are mirrored to Baseten storage once, not on every cold start.
  • No upstream dependency at runtime → Once mirrored, scale-ups and inference never contact the original source.
  • Multi-tier caching → In-cluster cache prevents redundant downloads; node cache provides instant access for subsequent replicas.
  • Deduplication → Identical weight files are stored once and shared via hardlinks.
  • Parallel downloads → Large models download faster with concurrent chunk fetching.

BDN proxy

BDN proxy is available by request. Contact us to enable it for your organization.
If your model downloads weights in application code rather than using the weights config, BDN proxy can accelerate those downloads. When enabled, Baseten routes your model container’s outbound HTTP(S) requests through a distributed caching proxy that caches downloads across cluster nodes. Subsequent replicas and scale-ups serve from cache instead of re-downloading from the origin. BDN proxy is transparent. You don’t need to change your model code. Baseten sets the following environment variables on your container:
Environment variablePurpose
BDN_PROXYProxy address.
REQUESTS_CA_BUNDLECA bundle for Python requests and other TLS clients.
SSL_CERT_FILECA bundle for general SSL/TLS clients.
PIP_CERTCA bundle for pip.
BDN proxy does not set HTTP_PROXY or HTTPS_PROXY. If your model code requires an explicit proxy, use the BDN_PROXY environment variable.

Troubleshooting

ErrorCauseFix
aws_access_key_id and aws_secret_access_key are required in S3 credentialsSecret JSON uses incorrect key names like access_key_id instead of aws_access_key_id.Use the exact key names aws_access_key_id, aws_secret_access_key, and aws_region in your secret JSON.
secret_id is requiredYour weights: source is s3:// or r2:// but the config has no auth: block, so the mirror can’t resolve credentials. Less commonly, the named secret was deleted or hasn’t propagated yet.Add an auth: block to the source, like auth: { auth_method: CUSTOM_SECRET, auth_secret_name: <secret-name> }. See AWS S3 or Cloudflare R2 for the per-source format. If the auth: block is already present, recreate the secret with a new name and redeploy.
no credentials configured: need either OIDC config or secret_idYour weights: source is gs:// but the config has no auth: block.Add an auth: block with either auth_method: GCP_OIDC and the OIDC fields, or auth_method: CUSTOM_SECRET and an auth_secret_name. See Google Cloud Storage.
Weights download silently skips files in subdirectoriesallow_patterns uses a flat glob like *.safetensors that only matches at the top level.Use **/*.safetensors for recursive matching across subdirectories.
Weights download completes but model fails to loadRequired files like config.json or tokenizer files are excluded by patterns.Add config.json and tokenizer.* to allow_patterns.

Migration from model_cache

model_cache is deprecated. Migrate to weights for faster cold starts through multi-tier caching.

Automated migration with truss migrate

The truss migrate CLI command automatically converts model_cache configurations:
# Run in your Truss directory
truss migrate

# Or specify a directory
truss migrate /path/to/truss
The command will:
  1. Show a colorized diff of the proposed changes.
  2. Prompt for confirmation before applying.
  3. Create a backup of your original config.yaml.
  4. Warn about any model.py path changes needed.

Manual migration reference

From model_cache to weights:
model_cacheweights
repo_id: "owner/repo"source: "hf://owner/repo@rev"
revision: "main"Included in source URI as @main
kind: "s3"Prefix: s3://bucket/path
kind: "gcs"Prefix: gs://bucket/path
kind: "r2"Prefix: r2://account_id.bucket/path
volume_folder: "name"mount_location: "/app/model_cache/name"
runtime_secret_nameauth.auth_secret_name
allow_patternsallow_patterns (same)
ignore_patternsignore_patterns (same)
Example migration:
config.yaml
weights:
  - source: "hf://meta-llama/Llama-3.1-8B@main"
    mount_location: "/app/model_cache/llama"
    allow_patterns:
      - "*.safetensors"
      - "config.json"
    auth:
      auth_method: CUSTOM_SECRET
      auth_secret_name: hf_access_token

Chains migration

For Truss Chains, update Assets.cached to Assets.weights in your Python code:
import truss_chains as chains
from truss.base import truss_config

class MyChainlet(chains.ChainletBase):
    remote_config = chains.RemoteConfig(
        assets=chains.Assets(
            weights=[
                truss_config.WeightsSource(
                    source="hf://meta-llama/Llama-3.1-8B@main",
                    mount_location="/app/model_cache/llama",
                    auth_secret_name="hf_access_token",
                    allow_patterns=["*.safetensors", "config.json"],
                )
            ],
            secret_keys=["hf_access_token"],
        ),
    )
Key changes:
  • ModelRepoWeightsSource.
  • repo_id + revisionsource URI with @revision suffix.
  • volume_foldermount_location (must be absolute path).
  • runtime_secret_nameauth.auth_secret_name (inside an auth block with auth_method: CUSTOM_SECRET).
  • Remove use_volume and kind (inferred from URI scheme).

Custom server migration

When migrating an existing custom server deployment from model_cache to weights:
  1. Remove truss-transfer-cli from your start_command. Files are pre-mounted before the container starts.
  2. Update file paths from /app/model_cache/{volume_folder} to your new mount_location.
config.yaml
docker_server:
  # No truss-transfer-cli needed - weights are pre-mounted
  start_command: text-embeddings-router --port 7997
    --model-id /models/jina --max-client-batch-size 128
weights:
  - source: "hf://jinaai/jina-embeddings-v2-base-code@516f4baf..."
    mount_location: "/models/jina"
    ignore_patterns: ["*.onnx"]
The Custom servers section shows the pattern for new deployments.

Automatic use with engine builders

Engine-builder deployments use BDN automatically. No weights block is required, and no configuration changes are needed when migrating an existing engine-builder deployment.
EngineWhen BDN is used
BEIEvery deploy.
Briton (Engine-Builder-LLM)Every deploy.
BIS-LLM (V2)Every deploy.
Build artifacts are mirrored once and served from the same multi-tier caches described in How it works.

Next steps