Skip to main content
The config.yaml file defines how your model runs on Baseten: its dependencies, compute resources, secrets, and runtime behavior. You specify what your model needs; Baseten handles the infrastructure. Every Truss includes a config.yaml in its root directory. Configuration is optional, every value has a sensible default. Common configuration tasks include:
If you’re new to YAML, here’s a quick primer. The default config uses [] for empty lists and {} for empty dictionaries. When adding values, the syntax changes to indented lines:
# Empty
requirements: []
secrets: {}

# With values
requirements:
  - torch
  - transformers
secrets:
  hf_access_token: null

Example

The following example shows a config file for a GPU-accelerated text generation model:
config.yaml
model_name: my-llm
description: A text generation model.
requirements:
  - torch
  - transformers
  - accelerate
resources:
  cpu: "4"
  memory: 16Gi
  accelerator: L4
secrets:
  hf_access_token: null
For more examples, see the truss-examples repository.

Reference

model_name
string
The name of your model. This is displayed in the model details page in the Baseten UI.
description
string
A description of your model.
model_class_name
string
default:"Model"
The name of the class that defines your Truss model. This class must implement at least a predict method.
model_module_dir
string
default:"model"
The folder containing your model class.
data_dir
string
default:"data/"
The folder for data files in your Truss. Access it in your model:
model/model.py
class Model:
  def __init__(self, **kwargs):
    data_dir = kwargs["data_dir"]

  # ...
bundled_packages_dir
string
default:"packages/"
The folder for custom packages in your Truss.Place your own code here to reference in model.py. For example, with this project structure:
stable-diffusion/
    packages/
        package_1/
            subpackage/
                script.py
    model/
        model.py
        __init__.py
    config.yaml
Inside the model.py the package can be imported like this:
model/model.py
from package_1.subpackage.script import run_script

class Model:
    def __init__(self, **kwargs):
        pass

    def load(self):
        run_script()

    ...
external_package_dirs
string[]
Use external_package_dirs to access custom packages located outside your Truss. This lets multiple Trusses share the same package.The following example shows a project structure where shared_utils/ is outside the Truss:
my-model/
    model/
        model.py
    config.yaml
shared_utils/
    helpers.py
Specify the path in your config.yaml:
config.yaml
external_package_dirs:
  - ../shared_utils/
Then import the package in your model.py:
model.py
from shared_utils.helpers import process_input

class Model:
    def predict(self, model_input):
        return process_input(model_input)
environment_variables
object
Key-value pairs exposed to the environment that the model executes in. Many Python libraries can be customized using environment variables.
Do not store secret values directly in environment variables (or anywhere in the config file). See the secrets field for information on properly managing secrets.
environment_variables:
  ENVIRONMENT: Staging
  DB_URL: https://my_database.example.com/
model_metadata
object
A flexible field for additional metadata. The entire config file is available to your model at runtime.Reserved keys that Baseten interprets:
  • example_model_input: sample input that populates the Baseten playground. This is useful for populating the playground with sample input.
For example, to configure a model with playground input and custom vLLM settings, use the following:
model_metadata:
  example_model_input: {"prompt": "What is the meaning of life?"}
  vllm_config:
    tensor_parallel_size: 1
    max_model_len: 4096
requirements_file
string
The path to a requirements file with Python dependencies. Pin versions for reproducibility.
requirements_file: ./requirements.txt
requirements
string[]
A list of Python dependencies in pip requirements file format. Installed after requirements_file.For example, to install pinned versions of the dependencies, use the following:
requirements:
  - scikit-learn==1.0.2
  - threadpoolctl==3.0.0
  - joblib==1.1.0
  - numpy==1.20.3
  - scipy==1.7.3
system_packages
string[]
System packages that you would typically install using apt on a Debian operating system.
system_packages:
  - ffmpeg
  - libsm6
  - libxext6
python_version
string
default:"py39"
The Python version to use. Supported versions:
  • py39
  • py310
  • py311
  • py312
  • py313
  • py314
secrets
object
Declare secrets your model needs at runtime, such as API keys or access tokens. Store the actual values in your workspace settings.
Never store actual secret values in config. Use null as a placeholder—the key name must match the secret name in your workspace.
secrets:
  hf_access_token: null
For more information, see Secrets.
examples_filename
string
default:"examples.yaml"
The path to a file containing example inputs for your model.
live_reload
boolean
default:"false"
If true, changes to your model code are automatically reloaded without restarting the server. Useful for development.
apply_library_patches
boolean
default:"true"
Whether to apply library patches for improved compatibility.

resources

The resources section specifies the compute resources that your model needs, including CPU, memory, and GPU resources. For example, to allocate an A10G GPU with 4 CPUs and 20 GB of memory, use the following:
resources:
  accelerator: A10G
  cpu: "4"
  memory: 20Gi
  use_gpu: true
cpu
string
default:"1"
CPU resources needed, expressed as either a raw number or "millicpus". For example, 1000m and 1 are equivalent. Fractional CPU amounts can be requested using millicpus. For example, 500m is half of a CPU core.
memory
string
default:"2Gi"
CPU RAM needed, expressed as a number with units. Units include "Gi" (Gibibytes), "G" (Gigabytes), "Mi" (Mebibytes), and "M" (Megabytes). For example, 1Gi and 1024Mi are equivalent.
Gi in resources.memory refers to Gibibytes, which are slightly larger than Gigabytes.
accelerator
string
The GPU type for your instance. Available GPUs:
  • T4
  • L4
  • A10G
  • V100
  • A100
  • A100_40GB
  • H100
  • H100_40GB
  • H200
If your model requires multiple GPUs, for example if the weights don't fit in a single GPU), use the : operator to request multiple GPUs:
resources:
  accelerator: L4:4 # Requests 4 L4s
For more information, see how to Manage resources.
node_count
number
The number of nodes for multi-node deployments. Each node gets the specified resources.

runtime

Runtime settings for your model instance. For example, to configure a high-throughput inference server with concurrency and health checks, use the following:
runtime:
  predict_concurrency: 256
  streaming_read_timeout: 120
  health_checks:
    restart_threshold_seconds: 600
    stop_traffic_threshold_seconds: 300
predict_concurrency
number
default:"1"
The number of concurrent requests that can run in your model's predict method. Default is 1, meaning predict runs one request at a time. Increase this if your model supports parallelism.See How to configure concurrency for more detail.
streaming_read_timeout
number
default:"60"
The timeout in seconds for streaming read operations.
enable_tracing_data
boolean
default:"false"
If true, enables trace data export with built-in OTEL instrumentation. By default, data is collected internally by Baseten for troubleshooting. You can also export to your own systems. See the tracing guide. May add performance overhead.
enable_debug_logs
boolean
default:"false"
If true, sets the Truss server log level to DEBUG instead of INFO.
transport
object
The transport protocol for your model. Supports http (default), websocket, and grpc.
runtime:
  transport:
    kind: websocket
    ping_interval_seconds: 30
    ping_timeout_seconds: 10
health_checks
object
Custom health check configuration for your deployments. For details, see Configuring health checks.
runtime:
  health_checks:
    restart_check_delay_seconds: 0
    restart_threshold_seconds: 1800
    stop_traffic_threshold_seconds: 1800
restart_check_delay_seconds
number
default:"0"
The delay in seconds before starting restart checks.
restart_threshold_seconds
number
default:"1800"
The time in seconds after which an unhealthy instance is restarted.
stop_traffic_threshold_seconds
number
default:"1800"
The time in seconds after which traffic is stopped to an unhealthy instance.

base_image

Use base_image to deploy a custom Docker image. This is useful for running scripts at build time or installing complex dependencies. For more information, see Deploy custom Docker images. For example, to use the vLLM Docker image as your base, use the following:
base_image:
  image: vllm/vllm-openai:v0.7.3
  python_executable_path: /usr/bin/python
# ...
image
string
The path to the Docker image, for example:
  • vllm/vllm-openai
  • lmsysorg/sglang
  • nvcr.io/nvidia/nemo:23.03
When using image tags like :latest, Baseten may use a cached copy even if the image has been updated. To pull the exact version, use image digests like your-image@sha256:abc123....
python_executable_path
string
A path to the Python executable on the image, for example /usr/bin/python.
base_image:
  image: vllm/vllm-openai:latest
  python_executable_path: /usr/bin/python
docker_auth
object
Authentication configuration for a private Docker registry.
base_image:
  docker_auth:
    auth_method: GCP_SERVICE_ACCOUNT_JSON
    secret_name: gcp-service-account
    registry: us-west2-docker.pkg.dev
For more information, see Private Docker registries.
auth_method
string
The authentication method for the private registry. Supported values:
  • GCP_SERVICE_ACCOUNT_JSON - authenticate with a GCP service account. Add your service account JSON blob as a Truss secret.
  • AWS_IAM - authenticate with an AWS IAM service account. Add aws_access_key_id and aws_secret_access_key to your Baseten secrets.
For GCP_SERVICE_ACCOUNT_JSON:
base_image:
  docker_auth:
    auth_method: GCP_SERVICE_ACCOUNT_JSON
    secret_name: gcp-service-account
    registry: us-east4-docker.pkg.dev
For AWS_IAM:
base_image:
  docker_auth:
    auth_method: AWS_IAM
    registry: <aws account id>.dkr.ecr.<region>.amazonaws.com
secrets:
  aws_access_key_id: null
  aws_secret_access_key: null
secret_name
string
The Truss secret that stores the credential for authentication. Required for GCP_SERVICE_ACCOUNT_JSON. Ensure this secret is added to the secrets section.
registry
string
The registry to authenticate to (e.g., us-east4-docker.pkg.dev).
aws_access_key_id_secret_name
string
default:"aws_access_key_id"
The secret name for the AWS access key ID. Only used with AWS_IAM auth method.
aws_secret_access_key_secret_name
string
default:"aws_secret_access_key"
The secret name for the AWS secret access key. Only used with AWS_IAM auth method.

docker_server

Use docker_server to deploy a custom Docker image that has its own HTTP server, without writing a Model class. This is useful for deploying inference servers like vLLM or SGLang that provide their own endpoints. See Deploy custom Docker images for usage details. For example, to deploy vLLM serving Qwen 2.5 3B, use the following:
base_image:
  image: vllm/vllm-openai:v0.7.3
docker_server:
  start_command: vllm serve Qwen/Qwen2.5-3B-Instruct --enable-prefix-caching
  readiness_endpoint: /health
  liveness_endpoint: /health
  predict_endpoint: /v1/completions
  server_port: 8000
# ...
start_command
string
required
The command to start the server. Required when no_build is not true.
server_port
number
required
The port where the server runs.
predict_endpoint
string
required
The endpoint for inference requests. This is mapped to Baseten's /predict route.
readiness_endpoint
string
required
The endpoint for readiness probes. Determines when the container can accept traffic.
liveness_endpoint
string
required
The endpoint for liveness probes. Determines if the container needs to be restarted.
run_as_user_id
number
The user ID to run the server as inside the container.
no_build
boolean
If true, skip the build step and use the image as-is. When true, start_command is not required.
The /app directory is reserved by Baseten. Only /app and /tmp are writable in the container.

external_data

Use external_data to bundle data into your image at build time. This reduces cold-start time by making data available without downloading it at runtime.
external_data:
  - url: https://my-bucket.s3.amazonaws.com/my-data.tar.gz
    local_data_path: data/my-data.tar.gz
    name: my-data
url
string
required
The URL to download data from.
local_data_path
string
required
The path on the image where the data will be downloaded to.
name
string
A name for the data, useful for readability purposes.
backend
string
default:"http_public"
The download backend to use.

build_commands

build_commands
string[]
A list of commands to run at build time. Useful for performing one-off bash commands.For example, to clone a GitHub repository, use the following:
build_commands:
  - git clone https://github.com/comfyanonymous/ComfyUI.git
Or to install Ollama into the container at build time, use the following:
model_name: ollama-tinyllama
base_image:
  image: python:3.11-slim
build_commands:
  - curl -fsSL https://ollama.com/install.sh | sh
docker_server:
  start_command: sh -c "ollama serve & sleep 5 && ollama pull tinyllama && wait"
  readiness_endpoint: /api/tags
  liveness_endpoint: /api/tags
  predict_endpoint: /api/generate
  server_port: 11434
resources:
  cpu: "4"
  memory: 8Gi
For more information, see Build commands.

build

The build section handles secret access during Docker builds. Other build-time configuration options are:
secret_to_path_mapping
object
Grants access to secrets during the build. Provide a mapping between a secret and a path on the image. You can then access the secret in commands specified in build_commands by running cat on the file.For example, to install a pip package from a private GitHub repository, use the following:
build_commands:
  - pip install git+https://$(cat /root/my-github-access-token)@github.com/path/to-private-repo.git
build:
  secret_to_path_mapping:
    my-github-access-token: /root/my-github-access-token
secrets:
  my-github-access-token: null
Under the hood, this option mounts your secret as a build secret. The value of your secret will be secure and will not be exposed in your Docker history or logs.

model_cache

Use model_cache to bundle model weights into your image at build time, reducing cold start latency. For example, to cache Llama 2 7B weights from Hugging Face, use the following:
model_cache:
  - repo_id: NousResearch/Llama-2-7b-chat-hf
    revision: main
    ignore_patterns:
      - "*.bin"
    use_volume: true
    volume_folder: llama-2-7b-chat-hf
Despite the name model_cache, there are multiple backends supported, not just Hugging Face. You can also cache weights stored on GCS, S3, or Azure.
repo_id
string
required
The source path for your model weights. For example, to cache weights from a Hugging Face repo, use the following:
model_cache:
  - repo_id: madebyollin/sdxl-vae-fp16-fix
Or you can cache weights from buckets like GCS or S3, using the following options:
model_cache:
  - repo_id: gcs://path-to-my-bucket
    kind: gcs
  - repo_id: s3://path-to-my-bucket
    kind: s3
kind
string
default:"hf"
The source kind for the model cache. Supported values: hf (Hugging Face), gcs, s3, azure.
revision
string
The revision of your Hugging Face repo. Required when use_volume is true for Hugging Face repos.
use_volume
boolean
required
If true, caches model artifacts outside the container image. Recommended: true.
volume_folder
string
The location of the mounted folder. Required when use_volume is true. For example, volume_folder: myrepo makes the model available under /app/model_cache/myrepo at runtime.
allow_patterns
string[]
File patterns to include in the cache. Uses Unix shell-style wildcards. By default, all paths are included.
ignore_patterns
string[]
File patterns to ignore, streamlining the caching process. Use Unix shell-style wildcards. Example: ["*.onnx", "Readme.md"]. By default, nothing is ignored.
runtime_secret_name
string
default:"hf_access_token"
The secret name to use for runtime authentication (e.g., for private Hugging Face repos).

training_checkpoints

Configuration for deploying models from training checkpoints. For example, to deploy a model using checkpoints from a training job, use the following:
training_checkpoints:
  download_folder: /tmp/training_checkpoints
  artifact_references:
    - training_job_id: tr_abc123
      paths:
        - "checkpoint-*"
download_folder
string
default:"/tmp/training_checkpoints"
The folder to download the checkpoints to.
artifact_references
object[]
A list of artifact references to download.
training_job_id
string
required
The training job ID that the artifact reference belongs to.
paths
string[]
The paths of the files to download, which can contain * or ? wildcards.