Deploying from cloud storage lets you use your existing infrastructure. The engine pulls weights from your storage at build time, compiles them with TensorRT-LLM, and serves the result as a production endpoint. You don’t need to move or re-upload anything. Engine-Builder-LLM, BEI, and BIS-LLM all support this workflow.Documentation Index
Fetch the complete documentation index at: https://docs.baseten.co/llms.txt
Use this file to discover all available pages before exploring further.
To deploy from Baseten Training checkpoints instead, see Deploy with optimized inference engines.
Storage sources
Thecheckpoint_repository field in your config specifies where the engine pulls weights from. The source field accepts the following providers:
S3: Amazon S3 buckets.GCS: Google Cloud Storage.AZURE: Azure Blob Storage.HF: Hugging Face repositories.
config.yaml
Private storage credentials
To access private storage, add a JSON secret to your Baseten secrets manager and reference it withruntime_secret_name in your config.
- S3
- GCS
- Azure
- Hugging Face
Add a secret with your AWS credentials:Then reference the secret in your config:See AWS S3 authentication for full setup details including OIDC.
config.yaml
Related
- Engine-Builder-LLM configuration: Complete build and runtime options for LLMs.
- BEI reference configuration: Complete configuration for encoder models.
- BDN authentication guide: OIDC and service account authentication for cloud storage.
- Secrets management: Configure credentials for private storage.