How to do model I/O with files
Baseten supports a wide variety of file-based I/O approaches. These examples show our recommendations for working with files during model inference, whether local or remote, public or private, in the Truss or in your invocation code.
Files as input
Example: Send a file with JSON-serializable content
The Truss CLI has a -f
flag to pass file input. If you’re using the API endpoint via Python, get file contents with the standard f.read()
function.
truss predict -f input.json
Example: Send a file with non-serializable content
The -f
flag for truss predict
only applies to JSON-serializable content. For other files, like the audio files required by MusicGen Melody, the file content needs to be base64 encoded before it is sent.
import urllib3
model_id = ""
# Read secrets from environment variables
baseten_api_key = os.environ["BASETEN_API_KEY"]
# Open a local file
with open("mymelody.wav", "rb") as f: # mono wav file, 48khz sample rate
# Convert file contents into JSON-serializable format
encoded_data = base64.b64encode(f.read())
encoded_str = encoded_data.decode("utf-8")
# Define the data payload
data = {"prompts": ["happy rock" "energetic EDM", "sad jazz"], "melody": encoded_str, "duration": 8}
# Make the POST request
response = requests.post(url, headers=headers, data=data)
resp = urllib3.request(
"POST",
# Endpoint for production deployment, see API reference for more
f"https://model-{model_id}.api.baseten.co/production/predict",
headers={"Authorization": f"Api-Key {baseten_api_key}"},
json=data
)
data = resp.json()["data"]
# Save output to files
for idx, clip in enumerate(data):
with open(f"clip_{idx}.wav", "wb") as f:
f.write(base64.b64decode(clip))
Example: Send a URL to a public file
Rather than encoding and serializing a file to send in the HTTP request, you can instead write a Truss that takes a URL as input and loads the content in the preprocess()
function.
Here’s an example from Whisper in the model library.
from tempfile import NamedTemporaryFile
import requests
# Get file content without blocking GPU
def preprocess(self, request):
resp = requests.get(request["url"])
return {"content": resp.content}
# Use file content in model inference
def predict(self, model_input):
with NamedTemporaryFile() as fp:
fp.write(model_input["content"])
result = whisper.transcribe(
self._model,
fp.name,
temperature=0,
best_of=5,
beam_size=5,
)
segments = [
{"start": r["start"], "end": r["end"], "text": r["text"]}
for r in result["segments"]
]
return {
"language": whisper.tokenizer.LANGUAGES[result["language"]],
"segments": segments,
"text": result["text"],
}
Files as output
Example: Save model output to local file
When saving model output to a local file, there’s nothing Baseten-specific about the code. Just use the standard >
operator in bash or file.write()
function in Python to save the model output.
truss predict -d '"Model input!"' > output.json
Output for some models, like image and audio generation models, may need to be decoded before you save it. See how to parse base64 output for detailed examples.