How to do model I/O in binary
Decode and save binary model output
Baseten and Truss natively support model I/O in binary and use msgpack encoding for efficiency.
Deploy a basic Truss for binary I/O
If you need a deployed model to try the invocation examples below, follow these steps to create and deploy a super basic Truss that accepts and returns binary data. The Truss performs no operations and is purely illustrative.
Send raw bytes as model input
To send binary data as model input:
- Set the
content-type
HTTP header toapplication/octet-stream
- Use
msgpack
to encode the data or file - Make a POST request to the model
This code sample assumes you have a file Gettysburg.mp3
in the current working directory. You can download the 11-second file from our CDN or replace it with your own file.
import os
import requests
import msgpack
model_id = "MODEL_ID" # Replace with your model ID
deployment = "development" # `development`, `production`, or a deployment ID
baseten_api_key = os.environ["BASETEN_API_KEY"]
# Specify the URL to which you want to send the POST request
url = f"https://model-{model_id}.api.baseten.co/{deployment}/predict"
headers={
"Authorization": f"Api-Key {baseten_api_key}",
"content-type": "application/octet-stream",
}
with open('Gettysburg.mp3', 'rb') as file:
response = requests.post(
url,
headers=headers,
data=msgpack.packb({'byte_data': file.read()})
)
print(response.status_code)
print(response.headers)
To support certain types like numpy and datetime values, you may need to extend client-side msgpack
encoding with the same encoder and decoder used by Truss.
Parse raw bytes from model output
To use the output of a non-streaming model response, decode the response content.
# Continues `call_model.py` from above
binary_output = msgpack.unpackb(response.content)
# Change extension if not working with mp3 data
with open('output.mp3', 'wb') as file:
file.write(binary_output["byte_data"])
Streaming binary outputs
You can also stream output as binary. This is useful for sending large files or reading binary output as it is generated.
In the model.py
, you must create a streaming output.
# Replace the predict function in your Truss
def predict(self, model_input):
import os
current_dir = os.path.dirname(__file__)
file_path = os.path.join(current_dir, "tmpfile.txt")
with open(file_path, mode="wb") as file:
file.write(bytes(model_input["text"], encoding="utf-8"))
def iterfile():
# Get the directory of the current file
current_dir = os.path.dirname(__file__)
# Construct the full path to the .wav file
file_path = os.path.join(current_dir, "tmpfile.txt")
with open(file_path, mode="rb") as file_like:
yield from file_like
return iterfile()
Then, in your client, you can use streaming output directly without decoding.
import os
import requests
import json
model_id = "MODEL_ID" # Replace with your model ID
deployment = "development" # `development`, `production`, or a deployment ID
baseten_api_key = os.environ["BASETEN_API_KEY"]
# Specify the URL to which you want to send the POST request
url = f"https://model-{model_id}.api.baseten.co/{deployment}/predict"
headers={
"Authorization": f"Api-Key {baseten_api_key}",
}
s = requests.Session()
with s.post(
# Endpoint for production deployment, see API reference for more
f"https://model-{model_id}.api.baseten.co/{deployment}/predict",
headers={"Authorization": f"Api-Key {baseten_api_key}"},
data=json.dumps({"text": "Lorem Ipsum"}),
# Include stream=True as an argument so the requests libray knows to stream
stream=True,
) as response:
for token in response.iter_content(1):
print(token) # Prints bytes