Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.baseten.co/llms.txt

Use this file to discover all available pages before exploring further.

The right storage path depends on whether your files are known when you deploy or written by your model at runtime. For read-only data known at deploy time, such as model weights, tokenizers, and config files, BDN is the recommended path. BDN mirrors weights from Hugging Face, S3, GCS, or R2 once and serves them through multi-tier caches close to your pods, so cold starts read from local or nearby caches instead of pulling from the source on every scale-up. For local files, bundle them with your Truss in the data/ directory; for cases that need custom download logic, fetch them at runtime from your model code. For data your model writes during execution that other replicas can reuse, such as torch.compile artifacts and JIT caches, use b10cache. b10cache mounts a shared volume into every pod so writes from one replica become reads for the next. The first replica writes the file; later replicas read it back instead of regenerating it. Treat b10cache as a cache rather than a database, and always have a fallback path that runs if the file isn’t there yet.
The bundle and runtime-download alternatives slow cold starts compared to BDN: bundles re-pull from the container image on every scale-up, and runtime downloads re-fetch from the source unless you add caching.

Bundle files with your Truss

Use this path only when your data/ directory is under ~1 GB total. The files ship inside the container image, so every cold start re-pulls them, not just the first deploy. Larger bundles compound into slower scale-ups, and truss push itself slows down as the bundle grows. For anything bigger, use BDN instead.
Store model files inside your Truss using the data/ directory. The contents are copied into your container image at build time and mounted at /app/data at runtime. Example: Stable Diffusion 2.1 Truss structure
data/
    scheduler/
        scheduler_config.json
    text_encoder/
        config.json
        diffusion_pytorch_model.bin
    tokenizer/
        merges.txt
        tokenizer_config.json
        vocab.json
    unet/
        config.json
        diffusion_pytorch_model.bin
    vae/
        config.json
        diffusion_pytorch_model.bin
    model_index.json
Access bundled files in model.py:
class Model:
    def __init__(self, **kwargs):
        self._data_dir = kwargs["data_dir"]

    def load(self):
        self.model = StableDiffusionPipeline.from_pretrained(
            str(self._data_dir),
            revision="fp16",
            torch_dtype=torch.float16,
        ).to("cuda")

Download files at runtime

Use this pattern when you need fine-grained control over the download, such as decrypting files on the fly or lazily fetching a subset of a larger dataset. The example below loads weights from a private S3 bucket using boto3.
To load private S3 weights at deploy time, prefer BDN with IAM credentials. BDN mirrors the weights once and serves them from a multi-tier cache; the pattern below re-downloads on every cold start unless you add caching.

Step 1: Define AWS secrets in config.yaml

secrets:
  aws_access_key_id: null
  aws_secret_access_key: null
  aws_region: null # for example, us-east-1
  aws_bucket: null
Do not store actual credentials in config.yaml. Add them securely to the Baseten secrets manager.

Step 2: Authenticate with AWS in model.py

import boto3

def __init__(self, **kwargs):
    self._config = kwargs.get("config")
    secrets = kwargs.get("secrets")
    self.s3_client = boto3.client(
        "s3",
        aws_access_key_id=secrets["aws_access_key_id"],
        aws_secret_access_key=secrets["aws_secret_access_key"],
        region_name=secrets["aws_region"],
    )
    self.s3_bucket = secrets["aws_bucket"]

Step 3: Deploy

truss push --watch
If your model downloads weights at runtime via custom code, BDN proxy can cache those downloads across replicas. Available by request.