Transcribe audio with Chains
Process hours of audio in seconds using efficient chunking, distributed inference, and optimized GPU resources.
View example on GitHub
This guide walks through building an audio transcription pipeline using Chains. You’ll break down large media files, distribute transcription tasks across autoscaling deployments, and leverage high-performance GPUs for rapid inference.
1. Overview
This Chain enables fast, high-quality transcription by:
- Partitioning long files (10+ hours) into smaller segments.
- Detecting silence to optimize split points.
- Parallelizing inference across multiple GPU-backed deployments.
- Batching requests to maximize throughput.
- Using range downloads for efficient data streaming.
- Leveraging
asyncio
for concurrent execution.
2. Chain Structure
Transcription is divided into two processing layers:
- Macro chunks: Large segments (~300s) split from the source media file. These are processed in parallel to handle massive files efficiently.
- Micro chunks: Smaller segments (~5–30s) extracted from macro chunks and sent to the Whisper model for transcription.
3. Implementing the Chainlets
Transcribe
(Entrypoint Chainlet)
Handles transcription requests and dispatches tasks to worker Chainlets.
Function signature:
Steps:
- Validates that the media source supports range downloads.
- Uses FFmpeg to extract metadata and duration.
- Splits the file into macro chunks, optimizing split points at silent sections.
- Dispatches macro chunk tasks to the MacroChunkWorker for processing.
- Collects micro chunk transcriptions, merges results, and returns the final text.
Example request:
MacroChunkWorker
(Processing Chainlet)
Processes macro chunks by:
- Extracting relevant time segments using FFmpeg.
- Streaming audio instead of downloading full files for low latency.
- Splitting segments at silent points.
- Encoding audio in base64 for efficient transfer.
- Distributing micro chunks to the Whisper model for transcription.
This Chainlet runs in parallel with multiple instances autoscaled dynamically.
WhisperModel
(Inference Model)
A separately deployed Whisper model Chainlet handles speech-to-text transcription.
- Deployed independently to allow fast iteration on business logic without redeploying the model.
- Used across different Chains or accessed directly as a standalone model.
- Supports multiple environments (e.g., dev, prod) using the same instance.
Whisper can also be deployed as a standard Truss model, separate from the Chain.
4. Optimizing Performance
Even for very large files, processing time remains bounded by parallel execution.
Key performance tuning parameters:
micro_chunk_size_sec
→ Balance GPU utilization and inference latency.macro_chunk_size_sec
→ Adjust chunk size for optimal parallelism.- Autoscaling settings → Tune concurrency and replica counts for load balancing.
Example speedup:
5. Deploy & Run the Chain
Deploy WhisperModel first:
Copy the invocation URL and update WHISPER_URL
in transcribe.py
.
Deploy the transcription Chain:
Run transcription on a sample file:
Next Steps
- Learn more about Chains.
- Optimize GPU autoscaling for peak efficiency.
- Extend the pipeline with custom business logic.
Was this page helpful?