Model I/O in binary

Baseten and Truss natively support model I/O in binary and use msgpack encoding for efficiency.

Deploy a basic Truss for binary I/O

If you need a deployed model to try the invocation examples below, follow these steps to create and deploy a super basic Truss that accepts and returns binary data. The Truss performs no operations and is purely illustrative.

Steps for deploying an example Truss

Create a Truss

To create a Truss, run:

truss init binary_test

This creates a Truss in a new directory binary_test. By default, newly created Trusses implement an identity function that returns the exact input they are given.

Add logging

Optionally, modify binary_test/model/model.py to log that the data received is of type bytes:

binary_test/model/model.py

def predict(self, model_input):
    # Run model inference here
    print(f"Input type: {type(model_input['byte_data'])}")
    return model_input

Deploy the Truss

Deploy the Truss to Baseten with:

truss push

Send raw bytes as model input

To send binary data as model input:

Set the content-type HTTP header to application/octet-stream
Use msgpack to encode the data or file
Make a POST request to the model

This code sample assumes you have a file Gettysburg.mp3 in the current working directory. You can download the 11-second file from our CDN or replace it with your own file.

call_model.py

import os
import requests
import msgpack


model_id = "MODEL_ID"  # Replace this with your model ID
deployment = "development"  # `development`, `production`, or a deployment ID
baseten_api_key = os.environ["BASETEN_API_KEY"]
# Specify the URL to which you want to send the POST request
url = f"https://model-{model_id}.api.baseten.co/{deployment}/predict"
headers={
    "Authorization": f"Api-Key {baseten_api_key}",
    "content-type": "application/octet-stream",
}

with open('Gettysburg.mp3', 'rb') as file:
    response = requests.post(
        url,
        headers=headers,
        data=msgpack.packb({'byte_data': file.read()})
    )

print(response.status_code)
print(response.headers)

To support certain types like numpy and datetime values, you may need to extend client-side msgpack encoding with the same encoder and decoder used by Truss.

Parse raw bytes from model output

To use the output of a non-streaming model response, decode the response content.

call_model.py

# Continues `call_model.py` from above

binary_output = msgpack.unpackb(response.content)

# Change extension if not working with mp3 data
with open('output.mp3', 'wb') as file:
    file.write(binary_output["byte_data"])

Streaming binary outputs

You can also stream output as binary. This is useful for sending large files or reading binary output as it is generated. In the model.py, you must create a streaming output.

model/model.py

# Replace the predict function in your Truss
def predict(self, model_input):
    import os

    current_dir = os.path.dirname(__file__)
    file_path = os.path.join(current_dir, "tmpfile.txt")
    with open(file_path, mode="wb") as file:
            file.write(bytes(model_input["text"], encoding="utf-8"))

    def iterfile():
        # Get the directory of the current file
        current_dir = os.path.dirname(__file__)
        # Construct the full path to the .wav file
        file_path = os.path.join(current_dir, "tmpfile.txt")
        with open(file_path, mode="rb") as file_like:
            yield from file_like

    return iterfile()

Then, in your client, you can use streaming output directly without decoding.

stream_model.py

import os
import requests
import json

model_id = "MODEL_ID"  # Replace this with your model ID
deployment = "development"  # `development`, `production`, or a deployment ID
baseten_api_key = os.environ["BASETEN_API_KEY"]
# Specify the URL to which you want to send the POST request
url = f"https://model-{model_id}.api.baseten.co/{deployment}/predict"
headers={
    "Authorization": f"Api-Key {baseten_api_key}",
}

s = requests.Session()
with s.post(
    # Endpoint for production deployment, see API reference for more
    f"https://model-{model_id}.api.baseten.co/{deployment}/predict",
    headers={"Authorization": f"Api-Key {baseten_api_key}"},
    data=json.dumps({"text": "Lorem Ipsum"}),
    # Include stream=True as an argument so the requests libray knows to stream
    stream=True,
) as response:
    for token in response.iter_content(1):
        print(token) # Prints bytes

Get started

Concepts

Development

Deployment

Inference

Training

Observability

Troubleshooting

Model I/O in binary

Deploy a basic Truss for binary I/O

Send raw bytes as model input

Parse raw bytes from model output

Streaming binary outputs

Get started

Concepts

Development

Deployment

Inference

Training

Observability

Troubleshooting

​Deploy a basic Truss for binary I/O

​Send raw bytes as model input

​Parse raw bytes from model output

​Streaming binary outputs

Deploy a basic Truss for binary I/O

Send raw bytes as model input

Parse raw bytes from model output

Streaming binary outputs