Text-to-image and text-to-audio models like Stable Diffusion XL and MusicGen return the image or audio they create as base64-encoded strings, which then need to be parsed and saved as files. This guide provides examples for working with base64 output from these models.

Example: Parsing Stable Diffusion output into a file

To follow this example, deploy Stable Diffusion XL from the model library.

Python invocation

In this example, we’ll use a Python script to call the model and parse the response.

call.py
import urllib3
import base64
import os, sys

# Model ID for production deployment
model_id = ""
# Read secrets from environment variables
baseten_api_key = os.environ["BASETEN_API_KEY"]

# Call the model
resp = urllib3.request(
    "POST",
    # Endpoint for production deployment, see API reference for more
    f"https://model-{model_id}.api.baseten.co/production/predict",
    headers={"Authorization": f"Api-Key {baseten_api_key}"},
    json={"prompt": "A tree in a field under the night sky"}
)
image = resp.json()["data"]
# Decode image from base64 model output
img=base64.b64decode(image)
# Give file random name using base64 string
file_name = f'{image[-10:].replace("/", "")}.jpeg'
# Save image to file
img_file = open(file_name, 'wb')
img_file.write(img)
img_file.close()

Truss CLI invocation

You can also use the Truss CLI and pipe the results into a similar Python script.

Command line:

truss predict -d '{"prompt": "A tree in a field under the night sky"}' | python save.py

Script:

save.py
import json
import base64
import sys

# Read piped input from truss predict
resp = json.loads(sys.stdin.read())
image = resp["data"]
# Decode image from base64 model output
img=base64.b64decode(image)
# Give file random name using base64 string
file_name = f'{image[-10:].replace("/", "")}.jpeg'
# Save image to file
img_file = open(file_name, 'wb')
img_file.write(img)
img_file.close()

Example: Parsing MusicGen output into multiple files

To follow this example, deploy MusicGen from the model library.

Python invocation

In this example, we’ll use a Python script to call the model and parse the response.

call.py
import urllib3
import base64
import os, sys

# Model ID for production deployment
model_id = ""
# Read secrets from environment variables
baseten_api_key = os.environ["BASETEN_API_KEY"]

# Call the model
resp = urllib3.request(
    "POST",
    # Endpoint for production deployment, see API reference for more
    f"https://model-{model_id}.api.baseten.co/production/predict",
    headers={"Authorization": f"Api-Key {baseten_api_key}"},
    json={"prompts": ["happy rock", "energetic EDM", "sad jazz"], "duration": 8}
)
clips = resp.json()["data"]
# Decode clips from base64 and save output to files
for idx, clip in enumerate(clips):
  with open(f"clip_{idx}.wav", "wb") as f:
    f.write(base64.b64decode(clip))

Truss CLI invocation

You can also use the Truss CLI and pipe the results into a similar Python script.

Command line:

truss predict -d '{"prompts": ["happy rock", "energetic EDM", "sad jazz"], "duration": 8}' | python save.py

Script:

save.py
import json
import base64
import sys

# Read piped input from truss predict
resp = json.loads(sys.stdin.read())
clips = resp["data"]
# Decode clips from base64 and save output to files
for idx, clip in enumerate(clips):
  with open(f"clip_{idx}.wav", "wb") as f:
    f.write(base64.b64decode(clip))