A Chain is composed of multiple connected Chainlets working together to perform
a task.For example, the Chain in the diagram below takes a large audio file as input.
Then it splits it into smaller chunks, transcribes each chunk in parallel
(reducing the end-to-end latency), and finally aggregates and returns the
results.
To build an efficient Chain, we recommend drafting your high level
structure as a flowchart or diagram. This can help you identifying
parallelizable units of work and steps that need different (model/hardware)
resources.If one Chainlet creates many “sub-tasks” by calling other dependency
Chainlets (e.g. in a loop over partial work items),
these calls should be done as aynscio-tasks that run concurrently.
That way you get the most out of the parallelism that Chains offers. This
design pattern is extensively used in the
audio transcription example.
While using asyncio is essential for performance, it can also be tricky.
Here are a few caveats to look out for:
Executing operations in an async function that block the event loop for
more than a fraction of a second. This hinders the “flow” of processing
requests concurrently and starting RPCs to other Chainlets. Ideally use
native async APIs. Frameworks like vLLM or triton server offer such APIs,
similarly file downloads can be made async and you might find
AsyncBatcher useful.
If there is no async support, consider running blocking code in a
thread/process pool (as an attribute of a Chainlet).
Creating async tasks (e.g. with asyncio.ensure_future) does not start
the task immediately. In particular, when starting several tasks in a loop,
ensure_future must be alternated with operations that yield to the event
loop that, so the task can be started. If the loop is not async for or
contains other await statements, a “dummy” await can be added, for example
await asyncio.sleep(0). This allows the tasks to be started concurrently.