Deploy from cloud storage

Deploying from cloud storage lets you use your existing infrastructure. The engine pulls weights from your storage at build time, compiles them with TensorRT-LLM, and serves the result as a production endpoint. You don’t need to move or re-upload anything. Engine-Builder-LLM, BEI, and BIS-LLM all support this workflow.

To deploy from Baseten Training checkpoints instead, see Deploy with optimized inference engines.

Storage sources

The checkpoint_repository field in your config specifies where the engine pulls weights from. The source field accepts the following providers:

S3: Amazon S3 buckets.
GCS: Google Cloud Storage.
AZURE: Azure Blob Storage.
HF: Hugging Face repositories.

Here’s a minimal example using S3:

config.yaml

trt_llm:
  build:
    base_model: decoder
    checkpoint_repository:
      source: S3  # or GCS, AZURE, HF
      repo: s3://your-bucket/path/to/model/

Private storage credentials

To access private storage, add a JSON secret to your Baseten secrets manager and reference it with runtime_secret_name in your config.

S3
GCS
Azure
Hugging Face

Add a secret with your AWS credentials:

{
  "aws_access_key_id": "XXXXX",
  "aws_secret_access_key": "xxxxx/xxxxxx",
  "aws_region": "us-west-2"
}

Then reference the secret in your config:

config.yaml

secrets:
  aws_secret_json: "set token in baseten workspace"
trt_llm:
  build:
    checkpoint_repository:
      source: S3
      repo: s3://your-bucket/path/to/model
      runtime_secret_name: aws_secret_json

See AWS S3 authentication for full setup details including OIDC.

Add a secret with your GCP service account credentials:

{
  "private_key_id": "xxxxxxx",
  "private_key": "-----BEGIN PRIVATE KEY-----\nMI",
  "client_email": "b10-some@xxx-example.iam.gserviceaccount.com"
}

Then reference the secret in your config:

config.yaml

secrets:
  gcp_service_account: "set token in baseten workspace"
trt_llm:
  build:
    checkpoint_repository:
      source: GCS
      repo: gs://your-bucket/path/to/model
      runtime_secret_name: gcp_service_account

See Google Cloud Storage authentication for full setup details including GCP OIDC.

Add a secret with your Azure account key:

{
  "account_key": "xxxxx"
}

Then reference the secret in your config:

config.yaml

secrets:
  azure_secret_json: "set token in baseten workspace"
trt_llm:
  build:
    checkpoint_repository:
      source: AZURE
      repo: az://your-container/path/to/model
      runtime_secret_name: azure_secret_json

Public repositories don’t require a secret. For private or gated repositories, add your Hugging Face API token as a plain-text secret, then reference it in your config:

config.yaml

secrets:
  hf_access_token: "set token in baseten workspace"
trt_llm:
  build:
    checkpoint_repository:
      source: HF
      repo: meta-llama/Llama-3.1-8B
      runtime_secret_name: hf_access_token

Get your token from Hugging Face settings. The runtime_secret_name field defaults to hf_access_token, so you can omit it for public repos.

Engine-Builder-LLM configuration: Complete build and runtime options for LLMs.
BEI reference configuration: Complete configuration for encoder models.
BDN authentication guide: OIDC and service account authentication for cloud storage.
Secrets management: Configure credentials for private storage.

Get started

About Baseten

Inference

Development

Deployment

Engines

Training

Organization

Observability

Troubleshooting

Deploy from cloud storage

Storage sources

Private storage credentials

Get started

About Baseten

Inference

Development

Deployment

Engines

Training

Organization

Observability

Troubleshooting

Documentation Index

​Storage sources

​Private storage credentials

​Related

Storage sources

Private storage credentials

Related