The right storage path depends on whether your files are known when you deploy or written by your model at runtime. For read-only data known at deploy time, such as model weights, tokenizers, and config files, BDN is the recommended path. BDN mirrors weights from Hugging Face, S3, GCS, or R2 once and serves them through multi-tier caches close to your pods, so cold starts read from local or nearby caches instead of pulling from the source on every scale-up. For local files, bundle them with your Truss in theDocumentation Index
Fetch the complete documentation index at: https://docs.baseten.co/llms.txt
Use this file to discover all available pages before exploring further.
data/ directory; for cases that need custom download logic, fetch them at runtime from your model code.
For data your model writes during execution that other replicas can reuse, such as torch.compile artifacts and JIT caches, use b10cache. b10cache mounts a shared volume into every pod so writes from one replica become reads for the next. The first replica writes the file; later replicas read it back instead of regenerating it. Treat b10cache as a cache rather than a database, and always have a fallback path that runs if the file isn’t there yet.
The bundle and runtime-download alternatives slow cold starts compared to BDN: bundles re-pull from the container image on every scale-up, and runtime downloads re-fetch from the source unless you add caching.
Bundle files with your Truss
Store model files inside your Truss using thedata/ directory. The contents are copied into your container image at build time and mounted at /app/data at runtime.
Example: Stable Diffusion 2.1 Truss structure
model.py:
Download files at runtime
Use this pattern when you need fine-grained control over the download, such as decrypting files on the fly or lazily fetching a subset of a larger dataset. The example below loads weights from a private S3 bucket usingboto3.
To load private S3 weights at deploy time, prefer BDN with IAM credentials. BDN mirrors the weights once and serves them from a multi-tier cache; the pattern below re-downloads on every cold start unless you add caching.