View on GitHub
Setting up the config.yaml
To deploy a dockerized model, all you need is a config.yaml
. It specifies how to build your Docker image, start the server, and manage resources. Let’s break down each section.
Base Image
Sets the foundational Docker image to a lightweight Python 3.11 environment.config.yaml
Docker Server Configuration
Configures the server’s startup command, health check endpoints, prediction endpoint, and the port on which the server will run.config.yaml
Build Commands (Optional)
Pre-downloads model weights during the build phase to ensure the model is ready at container startup.config.yaml
Configure resources
Note that we need an L4 to run this model.config.yaml
Requirements
Lists the Python package dependencies required for the infinity embedding server.config.yaml
Runtime Settings
Sets the server to handle up to 40 concurrent inferences to manage load efficiently.config.yaml
Environment Variables
Defines essential environment variables including the Hugging Face access token, request batch size, queue size limit, and a flag to disable tracking.config.yaml