Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.baseten.co/llms.txt

Use this file to discover all available pages before exploring further.

When you write a Model class, Truss uses the Truss server base image by default. However, you can deploy pre-built containers. This guide covers deploying custom Docker containers with Truss.

Custom Docker containers

To deploy a custom Docker container, set base_image to your image and use the docker_server argument to specify how to run it.
config.yaml
base_image:
  image: your-registry/your-image:latest
docker_server:
  start_command: your-server-start-command
  server_port: 8000
  predict_endpoint: /predict
  readiness_endpoint: /health
  liveness_endpoint: /health
  • image: The Docker image to use.
  • start_command: The command to start the server. This overrides the base image’s default entrypoint.
  • server_port: The port to listen on.
  • predict_endpoint: The endpoint to forward requests to.
  • readiness_endpoint: The endpoint to check if the server is ready.
  • liveness_endpoint: The endpoint to check if the server is alive.
Port 8080 is reserved by Baseten’s internal reverse proxy. If your server binds to port 8080, the deployment fails with [Errno 98] address already in use.
For the full list of fields, see the configuration reference.

Non-root user

Containers run as a non-root user by default:
PropertyValue
Usernameapp
UID / GID60000
Home directory/home/app
If your base image expects a specific non-root UID, set run_as_user_id under docker_server:
config.yaml
base_image:
  image: your-registry/your-image:latest
docker_server:
  start_command: your-server-start-command
  server_port: 8000
  predict_endpoint: /predict
  readiness_endpoint: /health
  liveness_endpoint: /health
  run_as_user_id: 1000
The UID must already exist in the base image. Values 0 (root) and 60000 (platform default) are not allowed.
Many NVIDIA base images, including NIM and Triton, run as user ID 1000. Set run_as_user_id: 1000 when using these images.
Baseten automatically sets ownership of /app, /workspace, the packages directory, and $HOME to this UID. If your server writes to directories outside of these, ensure they are writable by the specified UID in your base image or via build_commands.
While predict_endpoint maps your server’s inference route to Baseten’s /predict endpoint, you can access any route exposed by your server using the sync endpoint.
Baseten endpointMaps to
/environments/production/predictYour predict_endpoint route
/environments/production/sync/{any/route}/{any/route} in your server
Example: If you set predict_endpoint: /v1/chat/completions:
Baseten endpointMaps to
/environments/production/predict/v1/chat/completions
/environments/production/sync/v1/models/v1/models
All other paths reach your server unchanged, including routes like /metrics and /health. If your server doesn’t handle a requested path, the reverse proxy returns whatever response your server returns (often its own 404).

Container filesystem

Writable directories

Your server process can write to these paths:
PathPurpose
/appApplication root, including your config.yaml and optional data/
/home/appHome directory ($HOME)
/tmpTemporary files
/workspaceGeneral-purpose scratch space
/packagesBundled packages
Paths outside this list are root-owned and not writable by your process. If you need to write elsewhere, change permissions during the build with build_commands, or set run_as_user_id so Baseten chowns the managed paths to your UID.

Working directory

Truss does not set a WORKDIR for custom server builds. The effective working directory is whatever your base image defines (often /). If your server expects a specific working directory, set it in your start_command:
config.yaml
docker_server:
  start_command: sh -c "cd /app && ./my-server"

Secrets

Secrets declared in config.yaml are mounted as read-only files at /secrets/{secret_name}. See Secrets in custom Docker images for usage.

Runtime environment

Baseten sets specific environment variables in every custom-server container to route traffic to your server, identify the container in logs and traces, and keep its runtime path intact. These names are reserved. If you set any of them in environment_variables, Baseten drops the value before deploying the container:
  • PORT
  • HOST, HOSTNAME
  • *_SERVICE_HOST, *_SERVICE_PORT*
  • PATH
Truss warns when it loads your config if you set PORT or HOSTNAME.
PORT is set to 8080 inside every container. Baseten’s reverse proxy listens on that port, so every container inherits PORT=8080 regardless of what your server binds to.If your server code reads os.environ.get("PORT", 8000) (or similar), it gets 8080 instead of your default. Bind your server directly to docker_server.server_port, or read the port from an environment variable you control (for example, MY_SERVER_PORT).

Platform-injected environment variables

Baseten sets these in every container at runtime:
VariableValue
APP_HOME/app
HOME/home/app (or /root if running as root)
PYTHON_EXECUTABLEPath to python3 in the base image
BT_MODEL_IDThe model’s ID
BT_MODEL_DEPLOYMENT_IDThe deployment’s ID
Read BT_MODEL_ID and BT_MODEL_DEPLOYMENT_ID from your server process to tag logs, metrics, or cache keys with deployment identity.

Base image environment variables

Environment variables baked into your base image (ENV UV_EXTRA_INDEX_URL=..., ENV PIP_CONSTRAINT=..., and so on) are visible to your server process at runtime. If your start_command or anything it invokes runs uv or pip, these inherited settings take effect. They don’t affect how Truss builds the container’s internal Python environment that runs the reverse proxy and process supervisor. If you want a clean install environment inside start_command, unset the inherited variables before invoking uv or pip.

Base image requirements

Standard (non-no_build) custom-server builds require:
  • A Debian-based base image (ID=debian or ID_LIKE=debian in /etc/os-release).
  • Python 3.x on PATH. The minor version is validated at build time.
No-build mode has no base image restrictions: your image is used as-is.

Deploy Ollama

This example deploys Ollama with the TinyLlama model using a custom Docker image. Ollama is a popular lightweight LLM inference server, similar to vLLM or SGLang. TinyLlama is small enough to run on a CPU.

1. Create the config

Create a config.yaml file with the following configuration:
config.yaml
model_name: ollama-tinyllama
base_image:
  image: python:3.11-slim
build_commands:
  - apt-get update && apt-get install -y curl ca-certificates zstd
  - curl -fsSL https://ollama.com/install.sh | sh
docker_server:
  start_command: sh -c "ollama serve & sleep 5 && ollama pull tinyllama && wait"
  readiness_endpoint: /api/tags
  liveness_endpoint: /api/tags
  predict_endpoint: /api/generate
  server_port: 11434
resources:
  cpu: "4"
  memory: 8Gi
The base_image field specifies the Docker image to use as your starting point, in this case a lightweight Python image. The build_commands section first installs the system packages that the Ollama install script requires (curl, ca-certificates, and zstd), then downloads and installs Ollama. The slim base image doesn’t include these packages by default. You can also use build_commands to install model weights or other dependencies. The start_command launches the Ollama server, waits for it to initialize, and then pulls the TinyLlama model. The readiness_endpoint and liveness_endpoint both point to /api/tags, which returns successfully when Ollama is running. The predict_endpoint maps Baseten’s /predict route to Ollama’s /api/generate endpoint. Finally, declare your resource requirements. This example only needs 4 CPUs and 8GB of memory. For a complete list of resource options, see the Resources page.

2. Deploy

To deploy the model, use the following:
truss push
Truss builds the Docker image and deploys it to Baseten as a published (production) deployment. Once the readiness_endpoint and liveness_endpoint return successfully, the model is ready to use.

3. Run inference

Ollama uses OpenAI API compatible endpoints to run inference and calls /api/generate to generate text. Since you mapped the /predict route to Ollama’s /api/generate endpoint, you can run inference by calling the /predict endpoint.
To run inference with Truss, use the predict command:
truss predict -d '{"model": "tinyllama", "prompt": "Write a short story about a robot dreaming", "stream": false, "options": {"num_predict": 50}}'
The following is an example of its response:
It was a dreary, grey day when the robots started to dream. 
They had been programmed to think like humans, but it wasn't until they began to dream that they realized just how far apart they actually were.
Congratulations! You’ve successfully deployed and run inference on a custom Docker image.

Per-request logging

Baseten assigns a unique request ID to every predict call and returns it in the X-Baseten-Request-Id response header. You can use this ID to filter your model’s logs down to a single request. For standard Truss models, request ID logging is automatic. For custom HTTP servers, you’ll need to extract the request ID from the incoming request header and include it in your JSON log output. Extract the request ID from the X-Baseten-Request-Id header:
server.py
import json
import logging
import sys

from fastapi import FastAPI, Request


class JSONFormatter(logging.Formatter):
    """Formats logs as JSON with request_id for Baseten log filtering."""

    def format(self, record):
        log_record = {
            "level": record.levelname,
            "message": record.getMessage(),
        }
        if getattr(record, "request_id", None):
            log_record["request_id"] = record.request_id
        return json.dumps(log_record)


handler = logging.StreamHandler(sys.stdout)
handler.setFormatter(JSONFormatter())
logger = logging.getLogger(__name__)
logger.addHandler(handler)
logger.setLevel(logging.INFO)

app = FastAPI()


@app.post("/predict")
async def predict(request: Request):
    request_id = request.headers.get("x-baseten-request-id")

    logger.info("Predict called", extra={"request_id": request_id})

    # ... your inference logic ...

    logger.info("Predict complete", extra={"request_id": request_id})
    return {"result": "..."}
Logs must be JSON formatted and written to stdout. The request_id field must be a top-level key in the JSON object.

No-build deployment

For security-hardened images that must remain completely unmodified, use no_build to skip the build step entirely. Baseten copies the image to its container registry without running docker build. No-build is only available for custom server deployments. Your Truss must use docker_server configuration. Standard Truss models with a model.py don’t support no_build.
No-build deployments are not enabled by default. Contact support to enable this feature for your organization.
config.yaml
base_image:
  image: your-registry/your-hardened-image:latest
docker_server:
  no_build: true
  server_port: 8000
  predict_endpoint: /predict
  readiness_endpoint: /health
  liveness_endpoint: /health
Set no_build: true and configure your server’s port and endpoints. Since the image runs unmodified, it must include its own HTTP server and health check endpoints. start_command is optional with no_build. If omitted, the image’s original ENTRYPOINT runs. If your image needs a different startup command, set start_command to override the entrypoint.

Runtime contract differences

No-build containers bypass Baseten’s reverse proxy and process supervisor, which changes a few things relative to a standard build:
  • Port 8080 is not reserved. Your server can bind to any port, including 8080.
  • Your server is directly exposed on docker_server.server_port.
  • Path routing is 1:1. See Routing below.
  • The data/ directory is still copied to /app/data if present in your Truss.

Routing

No-build deployments skip the URL remapping that standard custom server deployments use. All paths exposed by your server are accessible directly through Baseten’s routing. For example, if your server exposes /v2/listen/stream, you can reach it at:
https://model-<model_id>.api.baseten.co/environments/production/sync/v2/listen/stream
predict_endpoint has no effect on no-build deployments because Baseten does not remap paths. However, it’s still a required field, so setting it correctly serves as useful documentation of your server’s primary inference route.

Constraints

  • Requires a custom server deployment with docker_server configuration. Standard Truss models with a model.py don’t support no_build.
  • Development mode is not supported. Deploy with truss push (published deployments are the default).
  • Truss config fields beyond docker_server, base_image, environment_variables, secrets, and data are not available. Pass any additional configuration as environment variables.
  • If your image runs as a specific user, set run_as_user_id to that UID.

Pass configuration as environment variables

Since Truss config fields aren’t injected into no-build containers, use environment_variables to pass configuration:
config.yaml
base_image:
  image: your-registry/your-hardened-image:latest
docker_server:
  no_build: true
  server_port: 8000
  predict_endpoint: /predict
  readiness_endpoint: /health
  liveness_endpoint: /health
environment_variables:
  MODEL_NAME: my-model
  MAX_BATCH_SIZE: "32"
Access these in your server code with os.environ["MODEL_NAME"].

Next steps