Input data as bytes
Most models on Baseten allow you to send input data as a Base64 string within a JSON payload. While this is convenient for many use cases, Base64 encoding introduces serialization overhead, which may not be suitable for latency-sensitive applications.
For scenarios where low latency is critical—such as real-time transcription using Whisper—you can reduce input data overhead by sending it directly as bytes. This approach minimizes serialization overhead, helping to lower end-to-end latency.
Below is a Python code snippet demonstrating how to send audio data as bytes to a Whisper endpoint on Baseten:
The two key details to note are:
- Setting the header “Content-Type”: “application/octet-stream”.
- Using msgpack for the request body, as in data=msgpack.packb(body), instead of a standard JSON object.