Implementation (Advanced)
Deploy Custom Servers from Docker Images
A config.yaml is all you need
If you have an existing API server packaged in a Docker image—whether an open-source server like vLLM or a custom-built image—you can deploy it on Baseten with just a config.yaml
file.
1. Configuring a Custom Server in config.yaml
Define a Docker-based server by adding docker_server
:
config.yaml
Key Configurations
start_command
(required) – Command to start the server.predict_endpoint
(required) – Endpoint for serving requests (only one per model).server_port
(required) – Port where the server runs.readiness_endpoint
(required) – Used for Kubernetes readiness probes to determine when the container is ready to accept traffic.liveness_endpoint
(required) – Used for Kubernetes liveness probes to determine if the container needs to be restarted.
2. Example: Running a vLLM Server
This example deploys Meta-Llama-3.1-8B-Instruct using vLLM on an A10G GPU, with /health
as the readiness and liveness probe.
config.yaml
vLLM’s /health endpoint is used to determine when the server is ready or needs restarting.
More examples available in Truss examples repo.
3. Installing Custom Python Packages
To install additional Python dependencies, add a requirements.txt
file to your Truss.
Example: Infinity Embedding Model Server
config.yaml
4. Accessing Secrets in Custom Servers
To use API keys or other secrets, store them in Baseten and access them from /secrets
in the container.
Example: Accessing a Hugging Face Token
config.yaml
Inside your server, access it like this:
More on secrets management here.
Was this page helpful?