weights key in your config.
This works with both Model class deployments and custom Docker images.
Get started
Add weights to a new model
Custom servers
Use with vLLM, SGLang, and more
Migrate
Move from
model_cacheQuick start
Add aweights section to your config.yaml:
config.yaml
| Field | Description |
|---|---|
source | Where to fetch weights from. Supports Hugging Face, S3, GCS, and R2. |
mount_location | Absolute path where weights appear in your container. |
allow_patterns | Optional. Only download files matching these patterns. Useful for skipping large files you don’t need. See filtering files. |
ignore_patterns | Optional. Exclude files matching these patterns. Useful for skipping documentation or unused formats. |
auth_secret_name to reference a Baseten secret with your credentials.
Accessing weights in your model
When your model starts, weights are already downloaded and available at yourmount_location.
The directory structure from the source is preserved:
load() method. No download code needed:
model.py
truss push and cached, so cold starts only read from local or nearby caches.
Configuration reference
weights
A list of weight sources to mount into your model container.
config.yaml
URI specifying where to fetch weights from. Supported schemes:
hf://: Hugging Face Hub.s3://: AWS S3.gs://: Google Cloud Storage.r2://: Cloudflare R2.
@revision suffix (branch, tag, or commit SHA).Absolute path where weights will be mounted in your container. Must start with
/.Name of a Baseten secret containing credentials for accessing private weight sources. See Source types and authentication for the expected secret format for each source type.
File patterns to include. Uses Unix shell-style wildcards. Only matching files will be downloaded.
File patterns to exclude. Uses Unix shell-style wildcards. Matching files will be skipped.
Source types and authentication
For private weight sources, create a Baseten secret with the appropriate credentials. Manage secrets in your Baseten settings.Hugging Face
Download weights from Hugging Face Hub repositories.config.yaml
hf://owner/repo@revision
owner/repo: The Hugging Face repository.@revision: Branch, tag, or commit SHA.
Revision pinning: When you use a branch name like
@main, Baseten resolves it to the specific commit SHA at deploy time and mirrors those exact files. Your deployment stays pinned to that version. Subsequent scale-ups won’t pick up new commits. To update to newer weights, push a new deployment.| Secret Name | Secret Value |
|---|---|
hf_access_token | hf_xxxxxxxxxxxxxxxx... |
AWS S3
Download weights from an S3 bucket.config.yaml
s3://bucket/path
Authentication: JSON with AWS credentials
| Secret Name | Secret Value |
|---|---|
aws_credentials | {"access_key_id": "AKIA...", "secret_access_key": "...", "region": "us-west-2"} |
The
region field is required. Optionally include session_token for temporary credentials.Google Cloud Storage
Download weights from a GCS bucket.config.yaml
gs://bucket/path
Authentication: GCP service account JSON key
| Secret Name | Secret Value |
|---|---|
gcp_service_account | {"type": "service_account", "project_id": "...", ...} |
Cloudflare R2
Download weights from a Cloudflare R2 bucket.config.yaml
r2://account_id.bucket/path
account_id: Your Cloudflare account ID.bucket: R2 bucket name, separated from account_id by a period.path: Path prefix within the bucket.
| Secret Name | Secret Value |
|---|---|
r2_credentials | {"access_key_id": "...", "secret_access_key": "..."} |
Migration from model_cache
Automated migration with truss migrate
The truss migrate CLI command automatically converts model_cache configurations:
- Show a colorized diff of the proposed changes.
- Prompt for confirmation before applying.
- Create a backup of your original
config.yaml. - Warn about any
model.pypath changes needed.
Manual migration reference
Frommodel_cache to weights:
model_cache | weights |
|---|---|
repo_id: "owner/repo" | source: "hf://owner/repo@rev" |
revision: "main" | Included in source URI as @main |
kind: "s3" | Prefix: s3://bucket/path |
kind: "gcs" | Prefix: gs://bucket/path |
kind: "r2" | Prefix: r2://account_id.bucket/path |
volume_folder: "name" | mount_location: "/app/model_cache/name" |
runtime_secret_name | auth_secret_name |
allow_patterns | allow_patterns (same) |
ignore_patterns | ignore_patterns (same) |
- After (weights)
- Before (model_cache)
config.yaml
Chains migration
For Truss Chains, updateAssets.cached to Assets.weights in your Python code:
- After (weights)
- Before (cached)
ModelRepo→WeightsSource.repo_id+revision→sourceURI with@revisionsuffix.volume_folder→mount_location(must be absolute path).runtime_secret_name→auth_secret_name.- Remove
use_volumeandkind(inferred from URI scheme).
Custom server migration
If you’re using a custom server withmodel_cache, you’ll need to make additional changes when migrating to weights:
- Remove
truss-transfer-clifrom yourstart_command. Withweights, files are pre-mounted before your container starts. - Update file paths from
/app/model_cache/{volume_folder}to your newmount_location.
- After (weights)
- Before (model_cache)
config.yaml
Best practices
Pin to specific commits
Always pin to a specific commit SHA for reproducible deployments:Filter files with patterns
Only download what you need to minimize cold start time:Use absolute mount paths
Themount_location must be an absolute path (starting with /):
Keep mount locations unique
Each weight source must have a uniquemount_location:
When weights are re-mirrored
Baseten caches weights based on a hash of their configuration and reuses cached weights when possible to avoid redundant downloads. Deduplication and mutation detection: Baseten deduplicates files based on their etag (a content hash), not just filename, and only re-mirrors files that have been mutated since the last pull. Unchanged files are reused from blob storage, even across deployments. Changes that trigger re-mirroring:| Field | Re-mirrors? | Why |
|---|---|---|
source | ✅ Yes | Different repository, revision, or path |
allow_patterns | ✅ Yes | Different files will be downloaded |
ignore_patterns | ✅ Yes | Different files will be downloaded |
| Field | Re-mirrors? | Why |
|---|---|---|
auth_secret_name | ❌ No | Credentials don’t affect which files are mirrored |
mount_location | ❌ No | Only affects where weights appear in your container |
How it works
What happens when you truss push
Your truss push command returns immediately after the deployment is created in Baseten. The mirroring process runs in the background, but your model will not be deployed to the Workload Plane until mirroring completes. This ensures weights are available before your model pod starts.
What happens on cold start
Baseten runs multiple Workload Planes across regions and clusters. Each Workload Plane has its own in-cluster cache for fast weight delivery: When your model pod starts:- The BDN Agent on the node fetches the manifest for your weights.
- Weights are downloaded through the In-Cluster Cache (shared across pods in the cluster).
- Weights are stored in the Node Cache (part of the BDN Agent, shared across pods on the same node).
- Weights are mounted read-only to your model pod.
Key benefits
- Non-blocking push →
truss pushreturns immediately; mirroring happens in the background. - One-time mirroring → Weights are mirrored to Baseten storage once, not on every cold start.
- No upstream dependency at runtime → Once mirrored, scale-ups and inference never contact the original source.
- Multi-tier caching → In-cluster cache prevents redundant downloads; node cache provides instant access for subsequent replicas.
- Deduplication → Identical weight files are stored once and shared via hardlinks.
- Parallel downloads → Large models download faster with concurrent chunk fetching.
Next steps
- Secrets — Store credentials for private weight sources.
- Custom Docker images — Deploy vLLM, SGLang, and other inference servers.
- Autoscaling — Configure replica scaling and cold start behavior.
- Configuration reference — Full list of
weightsoptions.