Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.baseten.co/llms.txt

Use this file to discover all available pages before exploring further.

Use b10cache for files your model writes at runtime that other replicas can reuse, such as torch.compile artifacts. For read-only weights known at deploy time, use BDN. For more information, see Data and storage.

Overview

Deployments sometimes produce files that are useful to other replicas. Using torch.compile, for example, produces a cache that can speed up future torch.compile calls on the same function, reducing cold start time for other replicas. b10cache stores these files. It’s a volume mounted over the network onto each of your pods, with two scopes:

Organization scope: /cache/org/

Shared across every pod you deploy in your organization. Move a file into this directory and any pod can read it.

Deployment scope: /cache/model/

Shared across every pod within a single deployment. Use this scope to keep deployment filesystems isolated.

Not persistent object storage

b10cache is reliable, but treat it as a cache, not a database. Always have a fallback path that runs if the file isn’t there yet. For example, the first replica of a new deployment writes to b10cache rather than reading from it.
For more information, see Torch compile caching, which uses b10cache to persist torch.compile artifacts across replicas.