Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.baseten.co/llms.txt

Use this file to discover all available pages before exploring further.

Training jobs need model weights, training datasets, and configuration files. Baseten provides multiple ways to get data into your training container, from cached delivery through Baseten Delivery Network (BDN) to direct downloads in your training script.

Load weights and data with BDN

Use the weights parameter on TrainingJob to mount model weights and training data into your container through BDN. BDN mirrors your data once and serves it from multi-tier caches, so subsequent jobs start faster.
BDN mirrors your weights to Baseten storage during the CREATED state, before any compute is provisioned. Once your job is scheduled on a node, BDN places the weights on local disk before your start_commands run. Weight delivery never overlaps with workload execution, so BDN has no effect on training throughput. The only difference between a cache hit and a cache miss is how long the deploy phase takes.
Each weight source specifies a remote URI and a local mount path. When your container starts, the data is already available at the mount_location. No download code needed in your training script.

Hugging Face and S3 example

Load model weights from Hugging Face and training data from S3, mounted into the training container before your code runs:
config.py
from truss_train import TrainingProject, TrainingJob, Image, Compute, Runtime, WeightsSource
from truss.base.truss_config import AcceleratorSpec

training_job = TrainingJob(
    image=Image(base_image="pytorch/pytorch:2.7.0-cuda12.8-cudnn9-runtime"),
    compute=Compute(
        accelerator=AcceleratorSpec(accelerator="H200", count=1),
    ),
    runtime=Runtime(
        start_commands=["python train.py"],
    ),
    weights=[
        WeightsSource(
            source="hf://Qwen/Qwen3-0.6B",
            mount_location="/app/models/Qwen/Qwen3-0.6B",
        ),
        WeightsSource(
            source="s3://my-bucket/training-data",
            mount_location="/app/data/training-data",
        ),
    ],
)

training_project = TrainingProject(name="qwen3-finetune", job=training_job)
In your training script, reference the mount paths directly:
train.py
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("/app/models/Qwen/Qwen3-0.6B")
tokenizer = AutoTokenizer.from_pretrained("/app/models/Qwen/Qwen3-0.6B")

# Training data is available at /app/data/training-data/

Supported sources

BDN supports these URI schemes:
SchemeExampleDescription
hf://hf://meta-llama/Llama-3.1-8B@mainHugging Face Hub.
s3://s3://my-bucket/path/to/dataAmazon S3.
gs://gs://my-bucket/path/to/dataGoogle Cloud Storage.
r2://r2://account_id.bucket/pathCloudflare R2.
For Hugging Face sources, pin to a specific revision with the @revision suffix (branch, tag, or commit SHA).

Authentication

Private or gated sources require authentication. Add an auth block to your WeightsSource:
Store a Hugging Face token as a Baseten secret:
WeightsSource(
    source="hf://meta-llama/Llama-3.1-8B@main",
    mount_location="/app/models/llama",
    auth={"auth_method": "CUSTOM_SECRET", "auth_secret_name": "hf_access_token"},
)
For the full list of authentication options and source-specific configuration, see the BDN configuration reference.

Filtering files

Use allow_patterns and ignore_patterns to download only the files you need:
WeightsSource(
    source="hf://meta-llama/Llama-3.1-8B@main",
    mount_location="/app/models/llama",
    allow_patterns=["*.safetensors", "config.json", "tokenizer.*"],
    ignore_patterns=["*.md", "*.txt"],
)

How BDN serves training jobs

When you submit a training job, BDN compares your weights config to what’s already in Baseten storage, pulls anything missing from the upstream source, and stages the full set on the node before your start_commands run. Data delivery happens entirely during the CREATED and DEPLOYING phases. Two cache tiers sit in front of Baseten’s mirror:
  • Cluster-local cache: shared across nodes in a GPU cluster. Populated the first time a job in that cluster pulls a given set of files.
  • Node-local cache: lives on the node itself. Populated when a job lands on that node.
Both caches evict with LRU. On a node-local hit, the node mounts the data directly and your job starts almost immediately. On a cluster-local hit, BDN transfers the data from the cluster cache to the node, which adds a small amount of deploy time. On a full miss, BDN pulls from its mirror, which adds more deploy time. None of these affect training throughput.

BDN or training cache?

Use BDN for read-only inputs that are known at job start, like model weights and frozen datasets. Baseten delivers them before training begins, so you never pay for IO or compute time while they load. Use the training cache when you need read-write storage that persists across jobs, or when one job produces data that a later job consumes. Common examples: pip package installs, compiled artifacts, and preprocessed datasets you build once and reuse.

Storage types overview

Baseten Training provides four ways to move data in and out of a job:
Storage typePersistenceUse case
BDN (weights)Mirrored once; cluster- and node-local LRU cachesRead-only model weights and datasets known at job start.
Training cacheRead-write, persistent between jobsPip packages, compiled artifacts, preprocessed datasets.
CheckpointingBacked up to cloud storageModel checkpoints and artifacts you want to deploy or download.
Ephemeral storageCleared after job completesTemporary files, intermediate outputs.
Training cache is scoped to a single GPU cluster. Data cached on one cluster (for example, H100) is not available on a different cluster (for example, H200). To use the same data on multiple clusters, duplicate it to each cluster’s cache or load it through BDN.

Ephemeral storage

Write temporary files to the $BT_SCRATCH_DIR directory. This path is backed by local NVMe storage on the node and is cleared when your job completes. Use it for:
  • Temporary files during training.
  • Intermediate outputs that don’t need to persist.
  • Scratch space for data processing.
import os

scratch = os.environ["BT_SCRATCH_DIR"]
tmp_output = os.path.join(scratch, "processed_data")
Do not write temporary files to arbitrary paths like /tmp or /root. Always use $BT_SCRATCH_DIR so Baseten can manage storage across hardware configurations.

Loading data in your training script

When data isn’t available through a BDN-supported URI scheme, download it directly in your training script. This works well for datasets loaded from framework-specific libraries or custom download logic.
Use Baseten secrets to authenticate to your S3 bucket.
  1. Add your AWS credentials as secrets in your Baseten account.
  2. Reference the secrets in your job configuration:
    from truss_train import definitions
    
    runtime = definitions.Runtime(
        environment_variables={
            "AWS_ACCESS_KEY_ID": definitions.SecretReference(name="aws_access_key_id"),
            "AWS_SECRET_ACCESS_KEY": definitions.SecretReference(name="aws_secret_access_key"),
        },
    )
    
  3. Download from S3 in your training script:
    import boto3
    
    s3 = boto3.client('s3')
    s3.download_file('my-bucket', 'training-data.tar.gz', '/path/to/local/file')
    
To avoid re-downloading large datasets on each job, download to the training cache and check if files exist before downloading.

Data size and limits

SizeDescription
SmallA few GBs.
MediumUp to 1 TB (most common).
Large1-10 TB.
The default training cache is 1 TB. Contact support to increase the cache size for larger datasets.

Data security

Data transfer happens within Baseten’s VPC using secure connections. Baseten doesn’t share customer data across tenants. When you enable training cache, data persists between jobs until you delete the project. Ephemeral storage is cleared when your job completes. For self-hosted deployments, training can use storage buckets in your own AWS or GCP account. To learn more and access official policies and certifications, visit the Baseten Trust Center.

Storage performance

Read and write speeds vary by cluster and storage configuration:
Storage typeWrite speedRead speed
Node storage1.2-1.8 GB/s1.7-2.1 GB/s
Training cache340 MB/s - 1.0 GB/s470 MB/s - 1.6 GB/s
For workloads with high I/O requirements or large storage requirements, contact support.

Next steps

  • BDN configuration reference: Full list of weight source options, authentication methods, and supported URI schemes.
  • Cache: Persist data between jobs and speed up training iterations.
  • Checkpointing: Save and manage model checkpoints during training.
  • Multinode training: Scale training across multiple nodes with shared cache access.