A Chain is composed of multiple connected Chainlets working together to perform a task.

For example, the Chain in the diagram below takes a large audio file as input. Then it splits it into smaller chunks, transcribes each chunk in parallel (reducing the end-to-end latency), and finally aggregates and returns the results.

To build an efficient Chain, we recommend drafting your high level structure as a flowchart or diagram. This can help you identifying parallelizable units of work and steps that need different (model/hardware) resources.

If one Chainlet creates many “sub-tasks” by calling other dependency Chainlets (e.g. in a loop over partial work items), these calls should be done as aynscio-tasks that run concurrently. That way you get the most out of the parallelism that Chains offers. This design pattern is extensively used in the audio transcription example.

While using asyncio is essential for performance, it can also be tricky. Here are a few caveats to look out for:

  • Executing operations in an async function that block the event loop for more than a fraction of a second. This hinders the “flow” of processing requests concurrently and starting RPCs to other Chainlets. Ideally use native async APIs. Frameworks like vLLM or triton server offer such APIs, similarly file downloads can be made async and you might find AsyncBatcher useful. If there is no async support, consider running blocking code in a thread/process pool (as an attributed of a Chainlet).
  • Creating async tasks (e.g. with asyncio.ensure_future) does not start the task immediately. In particular, when starting several tasks in a loop, ensure_future must be alternated with operations that yield to the event loop that, so the task can be started. If the loop is not async for or contains other await statements, a “dummy” await can be added, for example await asyncio.sleep(0). This allows the tasks to be started concurrently.