config.yaml file defines how your model runs on Baseten: its dependencies,
compute resources, secrets, and runtime behavior. You specify what your model
needs; Baseten handles the infrastructure.
Every Truss includes a config.yaml in its root directory. Configuration is
optional, every value has a sensible default.
Common configuration tasks include:
- Allocate GPU and memory: compute resources for your instance.
- Declare environment variables: environment variables for your model.
- Configure concurrency: parallel request handling.
- Use a custom Docker image: deploy pre-built inference servers.
YAML syntax
YAML syntax
If you’re new to YAML, here’s a quick primer.
The default config uses
[] for empty lists and {} for empty dictionaries.
When adding values, the syntax changes to indented lines:Example
The following example shows a config file for a GPU-accelerated text generation model:config.yaml
Reference
The name of your model.
This is displayed in the model details page in the Baseten UI.
A description of your model.
The name of the class that defines your Truss model.
This class must implement at least a
predict method.The folder containing your model class.
The folder for data files in your Truss. Access it in your model:
model/model.py
The folder for custom packages in your Truss.Place your own code here to reference in Inside the
model.py. For example, with this project structure:model.py the package can be imported like this:model/model.py
Use Specify the path in your Then import the package in your
external_package_dirs to access custom packages located outside your Truss.
This lets multiple Trusses share the same package.The following example shows a project structure where shared_utils/ is outside the Truss:config.yaml:config.yaml
model.py:model.py
Key-value pairs exposed to the environment that the model executes in.
Many Python libraries can be customized using environment variables.
A flexible field for additional metadata.
The entire config file is available to your model at runtime.Reserved keys that Baseten interprets:
example_model_input: sample input that populates the Baseten playground. This is useful for populating the playground with sample input.
The path to a requirements file with Python dependencies.
Pin versions for reproducibility.
A list of Python dependencies in pip requirements file format.
Installed after
requirements_file.For example, to install pinned versions of the dependencies, use the following:System packages that you would typically install using
apt on a Debian operating system.The Python version to use.
Supported versions:
py39py310py311py312py313py314
Declare secrets your model needs at runtime, such as API keys or access tokens.
Store the actual values in your workspace settings.For more information, see Secrets.
The path to a file containing example inputs for your model.
If true, changes to your model code are automatically reloaded without restarting the server. Useful for development.
Whether to apply library patches for improved compatibility.
resources
Theresources section specifies the compute resources that your model needs, including CPU, memory, and GPU resources.
For example, to allocate an A10G GPU with 4 CPUs and 20 GB of memory, use the following:
CPU resources needed, expressed as either a raw number or "millicpus".
For example,
1000m and 1 are equivalent.
Fractional CPU amounts can be requested using millicpus.
For example, 500m is half of a CPU core.CPU RAM needed, expressed as a number with units.
Units include "Gi" (Gibibytes), "G" (Gigabytes), "Mi" (Mebibytes), and "M" (Megabytes).
For example,
1Gi and 1024Mi are equivalent.Gi in resources.memory refers to Gibibytes, which are slightly larger
than Gigabytes.The GPU type for your instance.
Available GPUs:For more information, see how to Manage resources.
T4L4A10GV100A100A100_40GBH100H100_40GBH200
: operator to request multiple GPUs:The number of nodes for multi-node deployments. Each node gets the specified resources.
runtime
Runtime settings for your model instance. For example, to configure a high-throughput inference server with concurrency and health checks, use the following:The number of concurrent requests that can run in your model's predict method. Default is 1, meaning
predict runs one request at a time. Increase this if your model supports parallelism.See How to configure concurrency for more detail.The timeout in seconds for streaming read operations.
If true, enables trace data export with built-in OTEL instrumentation. By default, data is collected internally by Baseten for troubleshooting. You can also export to your own systems. See the tracing guide. May add performance overhead.
If true, sets the Truss server log level to
DEBUG instead of INFO.The transport protocol for your model. Supports
http (default), websocket, and grpc.Custom health check configuration for your deployments. For details, see Configuring health checks.
The delay in seconds before starting restart checks.
The time in seconds after which an unhealthy instance is restarted.
The time in seconds after which traffic is stopped to an unhealthy instance.
base_image
Usebase_image to deploy a custom Docker image. This is useful for running scripts at build time or installing complex dependencies.
For more information, see Deploy custom Docker images.
For example, to use the vLLM Docker image as your base, use the following:
The path to the Docker image, for example:
vllm/vllm-openailmsysorg/sglangnvcr.io/nvidia/nemo:23.03
When using image tags like
:latest, Baseten may use a cached copy even if the image has been updated. To pull the exact version, use image digests like your-image@sha256:abc123....A path to the Python executable on the image, for example
/usr/bin/python.Authentication configuration for a private Docker registry.For more information, see Private Docker registries.
The authentication method for the private registry. Supported values:For
GCP_SERVICE_ACCOUNT_JSON- authenticate with a GCP service account. Add your service account JSON blob as a Truss secret.AWS_IAM- authenticate with an AWS IAM service account. Addaws_access_key_idandaws_secret_access_keyto your Baseten secrets.
GCP_SERVICE_ACCOUNT_JSON:AWS_IAM:The Truss secret that stores the credential for authentication. Required for
GCP_SERVICE_ACCOUNT_JSON. Ensure this secret is added to the secrets section.The registry to authenticate to (e.g.,
us-east4-docker.pkg.dev).The secret name for the AWS access key ID. Only used with
AWS_IAM auth method.The secret name for the AWS secret access key. Only used with
AWS_IAM auth method.docker_server
Usedocker_server to deploy a custom Docker image that has its own HTTP server, without writing a Model class. This is useful for deploying inference servers like vLLM or SGLang that provide their own endpoints.
See Deploy custom Docker images for usage details.
For example, to deploy vLLM serving Qwen 2.5 3B, use the following:
The command to start the server. Required when
no_build is not true.The port where the server runs.
The endpoint for inference requests. This is mapped to Baseten's
/predict route.The endpoint for readiness probes. Determines when the container can accept traffic.
The endpoint for liveness probes. Determines if the container needs to be restarted.
The user ID to run the server as inside the container.
If true, skip the build step and use the image as-is. When true,
start_command is not required.The
/app directory is reserved by Baseten. Only /app and /tmp are writable in the container.external_data
Useexternal_data to bundle data into your image at build time. This reduces cold-start time by making data available without downloading it at runtime.
The URL to download data from.
The path on the image where the data will be downloaded to.
A name for the data, useful for readability purposes.
The download backend to use.
build_commands
A list of commands to run at build time.
Useful for performing one-off bash commands.For example, to clone a GitHub repository, use the following:Or to install Ollama into the container at build time, use the following:For more information, see Build commands.
build
Thebuild section handles secret access during Docker builds.
Other build-time configuration options are:
build_commands— shell commands to run during build.requirements— Python packages to install.system_packages— apt packages to install.base_image— custom Docker base image.
Grants access to secrets during the build.
Provide a mapping between a secret and a path on the image.
You can then access the secret in commands specified in Under the hood, this option mounts your secret as a build secret.
The value of your secret will be secure and will not be exposed in your Docker history or logs.
build_commands by running cat on the file.For example, to install a pip package from a private GitHub repository, use the following:model_cache
Usemodel_cache to bundle model weights into your image at build time, reducing cold start latency.
For example, to cache Llama 2 7B weights from Hugging Face, use the following:
Despite the name
model_cache, there are multiple backends supported, not just Hugging Face.
You can also cache weights stored on GCS, S3, or Azure.The source path for your model weights.
For example, to cache weights from a Hugging Face repo, use the following:Or you can cache weights from buckets like GCS or S3, using the following options:
The source kind for the model cache.
Supported values:
hf (Hugging Face), gcs, s3, azure.The revision of your Hugging Face repo.
Required when
use_volume is true for Hugging Face repos.If true, caches model artifacts outside the container image. Recommended:
true.The location of the mounted folder. Required when
use_volume is true.
For example, volume_folder: myrepo makes the model available under /app/model_cache/myrepo at runtime.File patterns to include in the cache. Uses Unix shell-style wildcards.
By default, all paths are included.
File patterns to ignore, streamlining the caching process. Use Unix shell-style wildcards. Example:
["*.onnx", "Readme.md"]. By default, nothing is ignored.The secret name to use for runtime authentication (e.g., for private Hugging Face repos).
training_checkpoints
Configuration for deploying models from training checkpoints. For example, to deploy a model using checkpoints from a training job, use the following:The folder to download the checkpoints to.
A list of artifact references to download.
The training job ID that the artifact reference belongs to.
The paths of the files to download, which can contain
* or ? wildcards.