Image generation
Building a text-to-image model with Flux Schnell
View example on GitHub
In this example, we go through a Truss that serves a text-to-image model. We use Flux Schnell, which is one of the highest performing text-to-image models out there today.
Set up imports and torch settings
In this example, we use the Hugging Face diffusers library to build our text-to-image model.
Define the Model
class and load function
In the load
function of the Truss, we implement logic involved in
downloading and setting up the model. For this model, we use the
FluxPipeline
class in diffusers
to instantiate our Flux pipeline,
and configure a number of relevant parameters.
See the diffusers docs for details on all of these parameters.
This is a utility function for converting a PIL image to base64.
Define the predict function
The predict
function contains the actual inference logic. The steps here are:
- Setting up the generation params. These include things like the prompt, image width, image height, number of inference steps, etc.
- Running the Diffusion Pipeline
- Convert the resulting image to base64 and return it
Setting up the config.yaml
Running Flux Schnell requires a handful of Python libraries, including
diffusers
, transformers
, and others.
Configuring resources for Flux Schnell
Note that we need an H100 40GB GPU to run this model.
System Packages
Running diffusers requires ffmpeg
and a couple other system
packages.
Enabling Caching
Flux Schnell is a large model, and downloading it could take several minutes. This means that the cold start time for this model is long. We can solve that by using our build caching feature. This moves the model download to the build stage of your model— caching the model will take about 15 minutes initially but you will get ~20s cold starts subsequently.
To enable caching, add the following to the config:
Deploy the model
Deploy the model like you would other Trusses, with:
Run an inference
Use a Python script to call the model once its deployed and parse its response. We parse the resulting base64-encoded string output into an actual image file: output_image.jpg
.