View example on GitHub
Set up imports and torch settings
In this example, we use the Hugging Face diffusers library to build our text-to-image model.model/model.py
Define the Model
class and load function
In the load
function of the Truss, we implement logic involved in
downloading and setting up the model. For this model, we use the
FluxPipeline
class in diffusers
to instantiate our Flux pipeline,
and configure a number of relevant parameters.
See the diffusers docs for details
on all of these parameters.
model/model.py
model/model.py
Define the predict function
Thepredict
function contains the actual inference logic. The steps here are:
- Setting up the generation params. These include things like the prompt, image width, image height, number of inference steps, etc.
- Running the Diffusion Pipeline
- Convert the resulting image to base64 and return it
model/model.py
Setting up the config.yaml
Running Flux Schnell requires a handful of Python libraries, including
diffusers
, transformers
, and others.
config.yaml
Configuring resources for Flux Schnell
Note that we need an H100 40GB GPU to run this model.config.yaml
System Packages
Running diffusers requiresffmpeg
and a couple other system
packages.
config.yaml
Enabling Caching
Flux Schnell is a large model, and downloading it could take several minutes. This means that the cold start time for this model is long. We can solve that by using our build caching feature. This moves the model download to the build stage of your model— caching the model will take about 15 minutes initially but you will get ~20s cold starts subsequently. To enable caching, add the following to the config:Deploy the model
Deploy the model like you would other Trusses, with:Run an inference
Use a Python script to call the model once its deployed and parse its response. We parse the resulting base64-encoded string output into an actual image file:output_image.jpg
.
infer.py