Skip to main content
Deploying from cloud storage lets you use your existing infrastructure. The engine pulls weights from your storage at build time, compiles them with TensorRT-LLM, and serves the result as a production endpoint. You don’t need to move or re-upload anything. Engine-Builder-LLM, BEI, and BIS-LLM all support this workflow.
To deploy from Baseten Training checkpoints instead, see Deploy with optimized inference engines.

Storage sources

The checkpoint_repository field in your config specifies where the engine pulls weights from. The source field accepts the following providers:
  • S3: Amazon S3 buckets.
  • GCS: Google Cloud Storage.
  • AZURE: Azure Blob Storage.
  • HF: Hugging Face repositories.
Here’s a minimal example using S3:
config.yaml
trt_llm:
  build:
    base_model: decoder
    checkpoint_repository:
      source: S3  # or GCS, AZURE, HF
      repo: s3://your-bucket/path/to/model/

Private storage credentials

To access private storage, add a JSON secret to your Baseten secrets manager and reference it with runtime_secret_name in your config.
Add a secret with your AWS credentials:
{
  "aws_access_key_id": "XXXXX",
  "aws_secret_access_key": "xxxxx/xxxxxx",
  "aws_region": "us-west-2"
}
Then reference the secret in your config:
config.yaml
secrets:
  aws_secret_json: "set token in baseten workspace"
trt_llm:
  build:
    checkpoint_repository:
      source: S3
      repo: s3://your-bucket/path/to/model
      runtime_secret_name: aws_secret_json
See AWS S3 authentication for full setup details including OIDC.