Baseten Delivery Network (BDN) reduces cold start times by mirroring your model weights to Baseten’s infrastructure and caching them close to your pods. Instead of downloading hundreds of gigabytes from Hugging Face, S3, or GCS on every scale-up, BDN mirrors weights once and serves them from multi-tier caches. Configure BDN using theDocumentation Index
Fetch the complete documentation index at: https://docs.baseten.co/llms.txt
Use this file to discover all available pages before exploring further.
weights key in your config.
This works with both Model class deployments and custom Docker images.
Get started
Add weights to a new model
Custom servers
Use with vLLM, SGLang, and more
Migrate
Move from
model_cacheQuick start
Add aweights section to your config.yaml:
config.yaml
| Field | Description |
|---|---|
source | Where to fetch weights from. Supports Hugging Face, Baseten Training, S3, GCS, and R2. |
mount_location | Absolute path where weights appear in your container. |
allow_patterns | Optional. Only download files matching these patterns. Useful for skipping large files you don’t need. See filtering files. |
ignore_patterns | Optional. Exclude files matching these patterns. Useful for skipping documentation or unused formats. |
auth section to reference a Baseten secret with your credentials.
Accessing weights in your model
When your model starts, weights are already downloaded and available at yourmount_location.
The directory structure from the source is preserved:
load() method. No download code needed:
model.py
truss push and cached, so cold starts only read from local or nearby caches.
Custom servers
Custom Docker servers like vLLM and SGLang work directly with BDN. BDN pre-mounts files atmount_location before the container starts, so the start_command reads weights without a separate download step.
config.yaml
Configuration reference
weights
A list of weight sources to mount into your model container.
config.yaml
URI specifying where to fetch weights from. Supported schemes:
hf://: Hugging Face Hub.bt://: Baseten Training.s3://: AWS S3.gs://: Google Cloud Storage.r2://: Cloudflare R2.
@revision suffix (branch, tag, or commit SHA).Absolute path where weights will be mounted in your container. Must start with
/.Authentication configuration for accessing private weight sources. See Source types and authentication for the expected format for each source type.
auth_method: The authentication method. UseCUSTOM_SECRETfor secret-based auth,AWS_OIDCfor AWS OIDC, orGCP_OIDCfor GCP OIDC.auth_secret_name: Name of a Baseten secret containing credentials (required forCUSTOM_SECRET).
File patterns to include. Uses Unix shell-style wildcards. Only matching files will be downloaded.Patterns like
*.safetensors only match files at the top level. Use **/*.safetensors to match files in subdirectories.File patterns to exclude. Uses Unix shell-style wildcards. Matching files will be skipped.
Source types and authentication
For private weight sources, create a Baseten secret with the appropriate credentials. Manage secrets in your Baseten settings.Hugging Face
Download weights from Hugging Face Hub repositories.config.yaml
hf://owner/repo@revision
owner/repo: The Hugging Face repository.@revision: Branch, tag, or commit SHA.
Revision pinning: When you use a branch name like
@main, Baseten resolves it to the specific commit SHA at deploy time and mirrors those exact files. Your deployment stays pinned to that version. Subsequent scale-ups won’t pick up new commits. To update to newer weights, push a new deployment.| Secret Name | Secret Value |
|---|---|
hf_access_token | hf_xxxxxxxxxxxxxxxx... |
Baseten Training
Load weights from a Baseten Training checkpoint.config.yaml
bt://project[@revision][/checkpoint]
project: The name of your Baseten Training project.@revision: Optional. A training job ID orlatest. Defaults tolatest./checkpoint: Optional. The checkpoint name within the training job. If omitted, uses the latest checkpoint.
Baseten automatically authenticates with your training project. No
auth configuration is required.AWS S3
Download weights from a private S3 bucket.Pick an auth method
AWS S3 supports two authentication paths, both first-class:- IAM credentials: Use this if you have an AWS access key pair and want the simplest setup. Skip ahead to the quick start.
- AWS OIDC: Use this if you want short-lived, narrowly scoped tokens and are comfortable configuring an IAM trust policy in your AWS account. See AWS OIDC.
Quick start with IAM credentials
Use this path when you already have an AWS access key pair for an IAM user or role with read access to your bucket.-
Create the secret in Baseten. In your secrets settings, add a secret named
aws_credentialswith this JSON value:Use these exact key names. Common variations likeaccess_key_id(without theaws_prefix) cause authentication failures. -
Reference the secret from your
config.yaml:config.yaml -
Grant the IAM user the minimum required permissions on the bucket:
The mirror lists objects under your prefix and downloads each file once. No write permissions are needed.
AWS OIDC
OIDC provides short-lived, narrowly scoped tokens for secure authentication without managing long-lived credentials.- Configure AWS to trust the Baseten OIDC provider and create an IAM role with S3 permissions.
-
Add the OIDC configuration to your
config.yaml:
config.yaml
No secrets needed! The
aws_oidc_role_arn and aws_oidc_region are not sensitive and can be committed to your repository.IAM credentials
config.yaml
s3://bucket/path
Authentication: JSON with AWS credentials
| Field | Required | Description |
|---|---|---|
aws_access_key_id | Yes | Access key ID for the IAM user or role. |
aws_secret_access_key | Yes | Secret access key paired with the access key ID. |
aws_region | No | Region of the bucket. Defaults to us-east-1. |
aws_session_token | No | Session token for temporary credentials, such as those issued by AWS STS or aws sso. |
Google Cloud Storage
Download weights from a GCS bucket. GCP supports using either service accounts or OIDC for GCS authentication.GCP OIDC (Recommended)
OIDC provides short-lived, narrowly scoped tokens for secure authentication without managing long-lived credentials.- Configure GCP Workload Identity to trust the Baseten OIDC provider and grant GCS permissions.
-
Add the OIDC configuration to your
config.yaml:
config.yaml
No secrets needed! The service account and workload identity provider are not sensitive and can be committed to your repository.
Service account
config.yaml
gs://bucket/path
Authentication: GCP service account JSON key
| Secret Name | Secret Value |
|---|---|
gcp_service_account | {"type": "service_account", "project_id": "...", ...} |
Cloudflare R2
Download weights from a Cloudflare R2 bucket.config.yaml
r2://account_id.bucket/path
account_id: Your Cloudflare account ID.bucket: R2 bucket name, separated from account_id by a period.path: Path prefix within the bucket.
| Secret Name | Secret Value |
|---|---|
r2_credentials | {"aws_access_key_id": "...", "aws_secret_access_key": "..."} |
Best practices
Pin to specific commits
Always pin to a specific commit SHA for reproducible deployments:Filter files with patterns
Only download what you need to minimize cold start time:Use absolute mount paths
Themount_location must be an absolute path (starting with /):
Keep mount locations unique
Each weight source must have a uniquemount_location:
When weights are re-mirrored
Baseten caches weights based on a hash of their configuration and reuses cached weights when possible to avoid redundant downloads. Deduplication and mutation detection: Baseten deduplicates files based on their etag (a content hash), not just filename, and only re-mirrors files that have been mutated since the last pull. Unchanged files are reused from blob storage, even across deployments. Changes that trigger re-mirroring:| Field | Re-mirrors? | Why |
|---|---|---|
source | ✅ Yes | Different repository, revision, or path |
allow_patterns | ✅ Yes | Different files will be downloaded |
ignore_patterns | ✅ Yes | Different files will be downloaded |
| Field | Re-mirrors? | Why |
|---|---|---|
auth | ❌ No | Credentials don’t affect which files are mirrored |
mount_location | ❌ No | Only affects where weights appear in your container |
How it works
What happens when you truss push
Your truss push command returns immediately after the deployment is created in Baseten. The mirroring process runs in the background, but your model will not be deployed to the Workload Plane until mirroring completes. This ensures weights are available before your model pod starts.
What happens on cold start
Baseten runs multiple Workload Planes across regions and clusters. Each Workload Plane has its own in-cluster cache for fast weight delivery: When your model pod starts:- The BDN Agent on the node fetches the manifest for your weights.
- Weights are downloaded through the In-Cluster Cache (shared across pods in the cluster).
- Weights are stored in the Node Cache (part of the BDN Agent, shared across pods on the same node).
- Weights are mounted read-only to your model pod.
Key benefits
- Non-blocking push →
truss pushreturns immediately; mirroring happens in the background. - One-time mirroring → Weights are mirrored to Baseten storage once, not on every cold start.
- No upstream dependency at runtime → Once mirrored, scale-ups and inference never contact the original source.
- Multi-tier caching → In-cluster cache prevents redundant downloads; node cache provides instant access for subsequent replicas.
- Deduplication → Identical weight files are stored once and shared via hardlinks.
- Parallel downloads → Large models download faster with concurrent chunk fetching.
BDN proxy
BDN proxy is available by request. Contact us to enable it for your organization.
weights config, BDN proxy can accelerate those downloads. When enabled, Baseten routes your model container’s outbound HTTP(S) requests through a distributed caching proxy that caches downloads across cluster nodes. Subsequent replicas and scale-ups serve from cache instead of re-downloading from the origin.
BDN proxy is transparent. You don’t need to change your model code. Baseten sets the following environment variables on your container:
| Environment variable | Purpose |
|---|---|
BDN_PROXY | Proxy address. |
REQUESTS_CA_BUNDLE | CA bundle for Python requests and other TLS clients. |
SSL_CERT_FILE | CA bundle for general SSL/TLS clients. |
PIP_CERT | CA bundle for pip. |
BDN proxy does not set
HTTP_PROXY or HTTPS_PROXY. If your model code requires an explicit proxy, use the BDN_PROXY environment variable.Troubleshooting
| Error | Cause | Fix |
|---|---|---|
aws_access_key_id and aws_secret_access_key are required in S3 credentials | Secret JSON uses incorrect key names like access_key_id instead of aws_access_key_id. | Use the exact key names aws_access_key_id, aws_secret_access_key, and aws_region in your secret JSON. |
secret_id is required | Your weights: source is s3:// or r2:// but the config has no auth: block, so the mirror can’t resolve credentials. Less commonly, the named secret was deleted or hasn’t propagated yet. | Add an auth: block to the source, like auth: { auth_method: CUSTOM_SECRET, auth_secret_name: <secret-name> }. See AWS S3 or Cloudflare R2 for the per-source format. If the auth: block is already present, recreate the secret with a new name and redeploy. |
no credentials configured: need either OIDC config or secret_id | Your weights: source is gs:// but the config has no auth: block. | Add an auth: block with either auth_method: GCP_OIDC and the OIDC fields, or auth_method: CUSTOM_SECRET and an auth_secret_name. See Google Cloud Storage. |
| Weights download silently skips files in subdirectories | allow_patterns uses a flat glob like *.safetensors that only matches at the top level. | Use **/*.safetensors for recursive matching across subdirectories. |
| Weights download completes but model fails to load | Required files like config.json or tokenizer files are excluded by patterns. | Add config.json and tokenizer.* to allow_patterns. |
Migration from model_cache
Automated migration with truss migrate
The truss migrate CLI command automatically converts model_cache configurations:
- Show a colorized diff of the proposed changes.
- Prompt for confirmation before applying.
- Create a backup of your original
config.yaml. - Warn about any
model.pypath changes needed.
Manual migration reference
Frommodel_cache to weights:
model_cache | weights |
|---|---|
repo_id: "owner/repo" | source: "hf://owner/repo@rev" |
revision: "main" | Included in source URI as @main |
kind: "s3" | Prefix: s3://bucket/path |
kind: "gcs" | Prefix: gs://bucket/path |
kind: "r2" | Prefix: r2://account_id.bucket/path |
volume_folder: "name" | mount_location: "/app/model_cache/name" |
runtime_secret_name | auth.auth_secret_name |
allow_patterns | allow_patterns (same) |
ignore_patterns | ignore_patterns (same) |
- After (weights)
- Before (model_cache)
config.yaml
Chains migration
For Truss Chains, updateAssets.cached to Assets.weights in your Python code:
- After (weights)
- Before (cached)
ModelRepo→WeightsSource.repo_id+revision→sourceURI with@revisionsuffix.volume_folder→mount_location(must be absolute path).runtime_secret_name→auth.auth_secret_name(inside anauthblock withauth_method: CUSTOM_SECRET).- Remove
use_volumeandkind(inferred from URI scheme).
Custom server migration
When migrating an existing custom server deployment frommodel_cache to weights:
- Remove
truss-transfer-clifrom yourstart_command. Files are pre-mounted before the container starts. - Update file paths from
/app/model_cache/{volume_folder}to your newmount_location.
- After (weights)
- Before (model_cache)
config.yaml
Automatic use with engine builders
Engine-builder deployments use BDN automatically. Noweights block is required, and no configuration changes are needed when migrating an existing engine-builder deployment.
| Engine | When BDN is used |
|---|---|
| BEI | Every deploy. |
| Briton (Engine-Builder-LLM) | Every deploy. |
| BIS-LLM (V2) | Every deploy. |
Next steps
- Secrets: Store credentials for private weight sources.
- Custom Docker images: Deploy vLLM, SGLang, and other inference servers.
- Autoscaling: Configure replica scaling and cold start behavior.
- Configuration reference: Full list of
weightsoptions.