When you write aDocumentation Index
Fetch the complete documentation index at: https://docs.baseten.co/llms.txt
Use this file to discover all available pages before exploring further.
Model class, Truss uses the
Truss server base image
by default. However, you can deploy pre-built containers.
This guide covers deploying custom Docker containers with Truss.
Custom Docker containers
To deploy a custom Docker container, setbase_image to your image
and use the docker_server argument to specify how to run it.
config.yaml
image: The Docker image to use.start_command: The command to start the server. This overrides the base image’s default entrypoint.server_port: The port to listen on.predict_endpoint: The endpoint to forward requests to.readiness_endpoint: The endpoint to check if the server is ready.liveness_endpoint: The endpoint to check if the server is alive.
Non-root user
Containers run as a non-root user by default:| Property | Value |
|---|---|
| Username | app |
| UID / GID | 60000 |
| Home directory | /home/app |
run_as_user_id under docker_server:
config.yaml
0 (root) and 60000 (platform default) are not allowed.
Many NVIDIA base images, including NIM and Triton, run as user ID
1000. Set run_as_user_id: 1000 when using these images./app, /workspace, the packages directory, and $HOME to this UID. If your server writes to directories outside of these, ensure they are writable by the specified UID in your base image or via build_commands.
Endpoint mapping
Endpoint mapping
While
Example: If you set
All other paths reach your server unchanged, including routes like
predict_endpoint maps your server’s inference route to Baseten’s
/predict endpoint, you can access any route exposed by your server using the
sync endpoint.| Baseten endpoint | Maps to |
|---|---|
/environments/production/predict | Your predict_endpoint route |
/environments/production/sync/{any/route} | /{any/route} in your server |
predict_endpoint: /v1/chat/completions:| Baseten endpoint | Maps to |
|---|---|
/environments/production/predict | /v1/chat/completions |
/environments/production/sync/v1/models | /v1/models |
/metrics and /health. If your server doesn’t handle a requested path, the reverse proxy returns whatever response your server returns (often its own 404).Container filesystem
Writable directories
Your server process can write to these paths:| Path | Purpose |
|---|---|
/app | Application root, including your config.yaml and optional data/ |
/home/app | Home directory ($HOME) |
/tmp | Temporary files |
/workspace | General-purpose scratch space |
/packages | Bundled packages |
build_commands, or set run_as_user_id so Baseten chowns the managed paths to your UID.
Working directory
Truss does not set aWORKDIR for custom server builds. The effective working directory is whatever your base image defines (often /).
If your server expects a specific working directory, set it in your start_command:
config.yaml
Secrets
Secrets declared inconfig.yaml are mounted as read-only files at /secrets/{secret_name}. See Secrets in custom Docker images for usage.
Runtime environment
Baseten sets specific environment variables in every custom-server container to route traffic to your server, identify the container in logs and traces, and keep its runtime path intact. These names are reserved. If you set any of them inenvironment_variables, Baseten drops the value before deploying the container:
PORTHOST,HOSTNAME*_SERVICE_HOST,*_SERVICE_PORT*PATH
PORT or HOSTNAME.
Platform-injected environment variables
Baseten sets these in every container at runtime:| Variable | Value |
|---|---|
APP_HOME | /app |
HOME | /home/app (or /root if running as root) |
PYTHON_EXECUTABLE | Path to python3 in the base image |
BT_MODEL_ID | The model’s ID |
BT_MODEL_DEPLOYMENT_ID | The deployment’s ID |
BT_MODEL_ID and BT_MODEL_DEPLOYMENT_ID from your server process to tag logs, metrics, or cache keys with deployment identity.
Base image environment variables
Environment variables baked into your base image (ENV UV_EXTRA_INDEX_URL=..., ENV PIP_CONSTRAINT=..., and so on) are visible to your server process at runtime. If your start_command or anything it invokes runs uv or pip, these inherited settings take effect. They don’t affect how Truss builds the container’s internal Python environment that runs the reverse proxy and process supervisor.
If you want a clean install environment inside start_command, unset the inherited variables before invoking uv or pip.
Base image requirements
Standard (non-no_build) custom-server builds require:
- A Debian-based base image (
ID=debianorID_LIKE=debianin/etc/os-release). - Python 3.x on
PATH. The minor version is validated at build time.
Deploy Ollama
This example deploys Ollama with the TinyLlama model using a custom Docker image. Ollama is a popular lightweight LLM inference server, similar to vLLM or SGLang. TinyLlama is small enough to run on a CPU.1. Create the config
Create aconfig.yaml file with the following configuration:
config.yaml
base_image field specifies the Docker image to use as your starting
point, in this case a lightweight Python image. The build_commands section
first installs the system packages that the Ollama install script requires
(curl, ca-certificates, and zstd), then downloads and installs Ollama.
The slim base image doesn’t include these packages by default. You can also use
build_commands to install model weights or other dependencies.
The start_command launches the Ollama server, waits for it to initialize, and
then pulls the TinyLlama model.
The readiness_endpoint and liveness_endpoint
both point to /api/tags, which returns successfully when Ollama is running.
The predict_endpoint maps Baseten’s /predict route to Ollama’s
/api/generate endpoint.
Finally, declare your resource requirements. This example only needs 4 CPUs and
8GB of memory. For a complete list of resource options, see the
Resources page.
2. Deploy
To deploy the model, use the following:readiness_endpoint and liveness_endpoint
return successfully, the model is ready to use.
3. Run inference
Ollama uses OpenAI API compatible endpoints to run inference and calls/api/generate to generate text. Since you mapped the /predict route to
Ollama’s /api/generate endpoint, you can run inference by calling the
/predict endpoint.
- Truss CLI
- cURL
- Python
To run inference with Truss, use the
predict command:Per-request logging
Baseten assigns a unique request ID to every predict call and returns it in theX-Baseten-Request-Id response header. You can use this ID to filter your model’s logs down to a single request.
For standard Truss models, request ID logging is automatic. For custom HTTP servers, you’ll need to extract the request ID from the incoming request header and include it in your JSON log output.
Extract the request ID from the X-Baseten-Request-Id header:
- FastAPI
- Flask
server.py
Logs must be JSON formatted and written to stdout. The
request_id field must be a top-level key in the JSON object.No-build deployment
For security-hardened images that must remain completely unmodified, useno_build to skip the build step entirely. Baseten copies the image to its container registry without running docker build.
No-build is only available for custom server deployments. Your Truss must use docker_server configuration. Standard Truss models with a model.py don’t support no_build.
No-build deployments are not enabled by default. Contact support to enable this feature for your organization.
config.yaml
no_build: true and configure your server’s port and endpoints. Since the image runs unmodified, it must include its own HTTP server and health check endpoints.
start_command is optional with no_build. If omitted, the image’s original ENTRYPOINT runs. If your image needs a different startup command, set start_command to override the entrypoint.
Runtime contract differences
No-build containers bypass Baseten’s reverse proxy and process supervisor, which changes a few things relative to a standard build:- Port
8080is not reserved. Your server can bind to any port, including8080. - Your server is directly exposed on
docker_server.server_port. - Path routing is 1:1. See Routing below.
- The
data/directory is still copied to/app/dataif present in your Truss.
Routing
No-build deployments skip the URL remapping that standard custom server deployments use. All paths exposed by your server are accessible directly through Baseten’s routing. For example, if your server exposes/v2/listen/stream, you can reach it at:
Constraints
- Requires a custom server deployment with
docker_serverconfiguration. Standard Truss models with amodel.pydon’t supportno_build. - Development mode is not supported. Deploy with
truss push(published deployments are the default). - Truss config fields beyond
docker_server,base_image,environment_variables,secrets, anddataare not available. Pass any additional configuration as environment variables. - If your image runs as a specific user, set
run_as_user_idto that UID.
Pass configuration as environment variables
Since Truss config fields aren’t injected into no-build containers, useenvironment_variables to pass configuration:
config.yaml
os.environ["MODEL_NAME"].
Next steps
- Private registries: Pull images from AWS ECR, Google Artifact Registry, or Docker Hub
- Secrets: Access API keys and tokens in your container
- WebSockets: Enable WebSocket connections
- vLLM, SGLang, TensorRT-LLM: Deploy LLMs with popular inference servers