Model class, Truss uses the
Truss server base image
by default. However, you can deploy pre-built containers.
In this guide, you will learn how to set the your configuration file to run a
custom Docker image and deploy it to Baseten using Truss.
Configuration
To deploy a custom Docker image, setbase_image to your image
and use the docker_server argument to specify how to run it.
config.yaml
image: The Docker image to use.start_command: The command to start the server.server_port: The port to listen on.predict_endpoint: The endpoint to forward requests to.readiness_endpoint: The endpoint to check if the server is ready.liveness_endpoint: The endpoint to check if the server is alive.
Non-root user
If your base image expects a specific non-root UID, setrun_as_user_id under docker_server:
config.yaml
0 (root) and 60000 (platform default) are not allowed.
Many NVIDIA base images, including NIM and Triton, run as user ID
1000. Set run_as_user_id: 1000 when using these images./app, /workspace, the packages directory, and $HOME to this UID. If your server writes to directories outside of these, ensure they are writable by the specified UID in your base image or via build_commands.
Endpoint mapping
Endpoint mapping
While
Example: If you set
predict_endpoint maps your server’s inference route to Baseten’s
/predict endpoint, you can access any route in your server using the
sync endpoint.| Baseten endpoint | Maps to |
|---|---|
/environments/production/predict | Your predict_endpoint route |
/environments/production/sync/{any/route} | /{any/route} in your server |
predict_endpoint: /v1/chat/completions:| Baseten endpoint | Maps to |
|---|---|
/environments/production/predict | /v1/chat/completions |
/environments/production/sync/v1/models | /v1/models |
Deploy Ollama
This example deploys Ollama with the TinyLlama model using a custom Docker image. Ollama is a popular lightweight LLM inference server, similar to vLLM or SGLang. TinyLlama is small enough to run on a CPU.1. Create the config
Create aconfig.yaml file with the following configuration:
config.yaml
base_image field specifies the Docker image to use as your starting
point, in this case a lightweight Python image. The build_commands section
installs Ollama into the container at build time. You can also use this to
install model weights or other dependencies.
The start_command launches the Ollama server, waits for it to initialize, and
then pulls the TinyLlama model.
The readiness_endpoint and liveness_endpoint
both point to /api/tags, which returns successfully when Ollama is running.
The predict_endpoint maps Baseten’s /predict route to Ollama’s
/api/generate endpoint.
Finally, declare your resource requirements. This example only needs 4 CPUs and
8GB of memory. For a complete list of resource options, see the
Resources page.
2. Deploy
To deploy the model, use the following:readiness_endpoint and liveness_endpoint are successful, the model will be ready to use.
3. Run inference
Ollama uses OpenAI API compatible endpoints to run inference and calls/api/generate to generate text. Since you mapped the /predict route to
Ollama’s /api/generate endpoint, you can run inference by calling the
/predict endpoint.
- Truss CLI
- cURL
- Python
To run inference with Truss, use the
predict command:No-build deployment
For security-hardened images that must remain completely unmodified, useno_build to skip the build step entirely. Baseten copies the image to its container registry without running docker build.
No-build is only available for custom server deployments. Your Truss must use docker_server configuration. Standard Truss models with a model.py don’t support no_build.
No-build deployments are not enabled by default. Contact support to enable this feature for your organization.
config.yaml
no_build: true and configure your server’s port and endpoints. Since the image runs unmodified, it must include its own HTTP server and health check endpoints.
start_command is optional with no_build. If omitted, the image’s original ENTRYPOINT runs. If your image needs a different startup command, set start_command to override the entrypoint.
Constraints
- Requires a custom server deployment with
docker_serverconfiguration. Standard Truss models with amodel.pydon’t supportno_build. - Development mode is not supported. Deploy with
truss push(published deployments are the default). - Truss config fields beyond
docker_server,base_image,environment_variables, andsecretsare not available. Pass any additional configuration as environment variables. - If your image runs as a specific user, set
run_as_user_idto that UID.
Pass configuration as environment variables
Since Truss config fields aren’t injected into no-build containers, useenvironment_variables to pass configuration:
config.yaml
os.environ["MODEL_NAME"].
Next steps
- Private registries: Pull images from AWS ECR, Google Artifact Registry, or Docker Hub
- Secrets: Access API keys and tokens in your container
- WebSockets: Enable WebSocket connections
- vLLM, SGLang, TensorRT-LLM: Deploy LLMs with popular inference servers