Text-to-image and text-to-audio models like Stable Diffusion XL and MusicGen return the image or audio they create as base64-encoded strings, which then need to be parsed and saved as files. This guide provides examples for working with base64 output from these models.
Example: Parsing Stable Diffusion output into a file
To follow this example, deploy Stable Diffusion XL from the model library.
Python invocation
In this example, we’ll use a Python script to call the model and parse the response.
import urllib3
import base64
import os, sys
model_id = ""
baseten_api_key = os.environ["BASETEN_API_KEY"]
resp = urllib3.request(
"POST",
f"https://model-{model_id}.api.baseten.co/production/predict",
headers={"Authorization": f"Api-Key {baseten_api_key}"},
json={"prompt": "A tree in a field under the night sky"}
)
image = resp.json()["data"]
img=base64.b64decode(image)
file_name = f'{image[-10:].replace("/", "")}.jpeg'
img_file = open(file_name, 'wb')
img_file.write(img)
img_file.close()
Truss CLI invocation
You can also use the Truss CLI and pipe the results into a similar Python script.
Command line:
truss predict -d '{"prompt": "A tree in a field under the night sky"}' | python save.py
Script:
import json
import base64
import sys
resp = json.loads(sys.stdin.read())
image = resp["data"]
img=base64.b64decode(image)
file_name = f'{image[-10:].replace("/", "")}.jpeg'
img_file = open(file_name, 'wb')
img_file.write(img)
img_file.close()
Example: Parsing MusicGen output into multiple files
To follow this example, deploy MusicGen from the model library.
Python invocation
In this example, we’ll use a Python script to call the model and parse the response.
import urllib3
import base64
import os, sys
model_id = ""
baseten_api_key = os.environ["BASETEN_API_KEY"]
resp = urllib3.request(
"POST",
f"https://model-{model_id}.api.baseten.co/production/predict",
headers={"Authorization": f"Api-Key {baseten_api_key}"},
json={"prompts": ["happy rock", "energetic EDM", "sad jazz"], "duration": 8}
)
clips = resp.json()["data"]
for idx, clip in enumerate(clips):
with open(f"clip_{idx}.wav", "wb") as f:
f.write(base64.b64decode(clip))
Truss CLI invocation
You can also use the Truss CLI and pipe the results into a similar Python script.
Command line:
truss predict -d '{"prompts": ["happy rock", "energetic EDM", "sad jazz"], "duration": 8}' | python save.py
Script:
import json
import base64
import sys
resp = json.loads(sys.stdin.read())
clips = resp["data"]
for idx, clip in enumerate(clips):
with open(f"clip_{idx}.wav", "wb") as f:
f.write(base64.b64decode(clip))