Deploy Whisper V3
Deploy Whisper V3
Unlike LLMs that use pre-built engines, Whisper V3 is typically deployed as a custom Python Truss. This allows you to include preprocessing logic, such as usingffmpeg to handle various audio formats.
Configuration
Theconfig.yaml specifies the hardware requirements and the system packages needed for audio processing. Whisper V3 runs efficiently on an A10G or L4 GPU.
Model implementation
Themodel.py file defines the Model class, which loads the Whisper weights and handles the inference request. We use ffmpeg to convert incoming audio URLs into the 16kHz monochannel waveform that Whisper expects.
Run inference
Because this is a custom Python model, you use thepredict endpoint to send an audio URL and receive the transcription.
- Python SDK
- cURL
Configuration and tuning
Whisper V3 is highly versatile but can be resource-intensive for long audio files.Latency vs. Cost
For real-time applications, you may want to use a smaller variant likemedium or small if the accuracy trade-off is acceptable. These variants run significantly faster and can be deployed on smaller, cheaper GPUs like the T4.
Preprocessing
Usingffmpeg inside the predict method allows your model to handle almost any audio or video format automatically. However, for extremely high-volume workloads, you might consider moving audio preprocessing to a separate service or a Chain to avoid blocking the GPU during the conversion phase.
Related
- Model APIs — Instant access to transcription without dedicated infrastructure.
- Chains documentation — Orchestrate complex transcription pipelines.
- Truss documentation — Build and customize your own transcription models.