Skip to main content
Deploying from cloud storage lets you use your existing infrastructure. The engine pulls weights from your storage at build time, compiles them with TensorRT-LLM, and serves the result as a production endpoint. You don’t need to move or re-upload anything. Engine-Builder-LLM, BEI, and BIS-LLM all support this workflow.
To deploy from Baseten Training checkpoints instead, see Deploy with optimized inference engines.

Storage sources

The checkpoint_repository field in your config specifies where the engine pulls weights from. The source field accepts the following providers:
  • S3: Amazon S3 buckets.
  • GCS: Google Cloud Storage.
  • AZURE: Azure Blob Storage.
  • HF: Hugging Face repositories.
The revision field pins a specific commit or branch. For Hugging Face repos, this is a git ref (branch name, tag, or commit SHA). If unset, the engine uses the default branch. For cloud storage sources (S3, GCS, Azure), revision is not applicable. The repo path points to a specific prefix. Here’s a minimal example using S3:
config.yaml
trt_llm:
  build:
    base_model: decoder
    checkpoint_repository:
      source: S3  # or GCS, AZURE, HF
      repo: s3://your-bucket/path/to/model/

Private storage credentials

To access private storage, add a JSON secret to your Baseten secrets manager and reference it with runtime_secret_name in your config.
Add a secret with your AWS credentials:
{
  "aws_access_key_id": "XXXXX",
  "aws_secret_access_key": "xxxxx/xxxxxx",
  "aws_region": "us-west-2"
}
Then reference the secret in your config:
config.yaml
secrets:
  aws_secret_json: "set token in baseten workspace"
trt_llm:
  build:
    checkpoint_repository:
      source: S3
      repo: s3://your-bucket/path/to/model
      runtime_secret_name: aws_secret_json
See AWS S3 authentication for full setup details including OIDC.