Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.baseten.co/llms.txt

Use this file to discover all available pages before exploring further.

Deploying from cloud storage lets you use your existing infrastructure. The engine pulls weights from your storage at build time, compiles them with TensorRT-LLM, and serves the result as a production endpoint. You don’t need to move or re-upload anything. Engine-Builder-LLM, BEI, and BIS-LLM all support this workflow.
To deploy from Baseten Training checkpoints instead, see Deploy with optimized inference engines.

Storage sources

The checkpoint_repository field in your config specifies where the engine pulls weights from. The source field accepts the following providers:
  • S3: Amazon S3 buckets.
  • GCS: Google Cloud Storage.
  • AZURE: Azure Blob Storage.
  • HF: Hugging Face repositories.
Here’s a minimal example using S3:
config.yaml
trt_llm:
  build:
    base_model: decoder
    checkpoint_repository:
      source: S3  # or GCS, AZURE, HF
      repo: s3://your-bucket/path/to/model/

Private storage credentials

To access private storage, add a JSON secret to your Baseten secrets manager and reference it with runtime_secret_name in your config.
Add a secret with your AWS credentials:
{
  "aws_access_key_id": "XXXXX",
  "aws_secret_access_key": "xxxxx/xxxxxx",
  "aws_region": "us-west-2"
}
Then reference the secret in your config:
config.yaml
secrets:
  aws_secret_json: "set token in baseten workspace"
trt_llm:
  build:
    checkpoint_repository:
      source: S3
      repo: s3://your-bucket/path/to/model
      runtime_secret_name: aws_secret_json
See AWS S3 authentication for full setup details including OIDC.