View example on GitHub
Set up imports and torch settings
In this example, we use the Hugging Face diffusers library to build our text-to-image model.model/model.py
Define the Model class and load function
In the load function of the Truss, we implement logic involved in
downloading and setting up the model. For this model, we use the
FluxPipeline class in diffusers to instantiate our Flux pipeline,
and configure a number of relevant parameters.
See the diffusers docs for details
on all of these parameters.
model/model.py
model/model.py
Define the predict function
Thepredict function contains the actual inference logic. The steps here are:
- Setting up the generation params. These include things like the prompt, image width, image height, number of inference steps, etc.
- Running the Diffusion Pipeline
- Convert the resulting image to base64 and return it
model/model.py
Setting up the config.yaml
Running Flux Schnell requires a handful of Python libraries, including
diffusers, transformers, and others.
config.yaml
Configuring resources for Flux Schnell
Note that we need an H100 40GB GPU to run this model.config.yaml
System packages
Running diffusers requiresffmpeg and a couple other system
packages.
config.yaml
Enabling caching
Flux Schnell is a large model, and downloading it from Hugging Face on every cold start would take several minutes. The Baseten Delivery Network (BDN) mirrors weights to Baseten’s infrastructure once and serves them from multi-tier caches close to your pods, so cold starts read from a nearby cache instead of re-downloading from upstream. To enable BDN, add aweights block to your config:
model.py load() method then reads weights from mount_location instead of pulling from Hugging Face.
Deploy the model
Deploy the model like you would other Trusses, with:Run an inference
Use a Python script to call the model once it’s deployed and parse its response. We parse the resulting base64-encoded string output into an actual image file:output_image.jpg.
infer.py