Dockerized model
Deploy any model in a pre-built Docker container
View on Github
In this example, we deploy a dockerized model for infinity embedding server, a high-throughput, low-latency REST API server for serving vector embeddings.
Setting up the config.yaml
To deploy a dockerized model, all you need is a config.yaml
. It specifies how to build your Docker image, start the server, and manage resources. Let’s break down each section.
Base Image
Sets the foundational Docker image to a lightweight Python 3.11 environment.
Docker Server Configuration
Configures the server’s startup command, health check endpoints, prediction endpoint, and the port on which the server will run.
Build Commands (Optional)
Pre-downloads model weights during the build phase to ensure the model is ready at container startup.
Configure resources
Note that we need an L4 to run this model.
Requirements
Lists the Python package dependencies required for the infinity embedding server.
Runtime Settings
Sets the server to handle up to 40 concurrent inferences to manage load efficiently.
Environment Variables
Defines essential environment variables including the Hugging Face access token, request batch size, queue size limit, and a flag to disable tracking.
Deploy dockerized model
Deploy the model like you would other Trusses, with:
Was this page helpful?