Error handling in Chains follows the principle that the root cause “bubbles up” until the entrypoint - which returns an error response. Similarly to how python stack traces contain all the layers from where an exception was raised up until the main function.

Consider the case of a Chain where the entrypoint calls run_remote of a Chainlet named TextToNum and this in turn invokes TextReplicator. The respective run_remote methods might also use other helper functions that appear in the call stack.

Below is an example stack trace that shows how the root cause (a ValueError) is propagated up to the entrypoint’s run_remote method (this is what you would see as an error log):

Chainlet-Traceback (most recent call last):
  File "/packages/itest_chain.py", line 132, in run_remote
    value = self._accumulate_parts(text_parts.parts)
  File "/packages/itest_chain.py", line 144, in _accumulate_parts
    value += self._text_to_num.run_remote(part)
ValueError: (showing chained remote errors, root error at the bottom)
├─ Error in dependency Chainlet `TextToNum`:
│   Chainlet-Traceback (most recent call last):
│     File "/packages/itest_chain.py", line 87, in run_remote
│       generated_text = self._replicator.run_remote(data)
│   ValueError: (showing chained remote errors, root error at the bottom)
│   ├─ Error in dependency Chainlet `TextReplicator`:
│   │   Chainlet-Traceback (most recent call last):
│   │     File "/packages/itest_chain.py", line 52, in run_remote
│   │       validate_data(data)
│   │     File "/packages/itest_chain.py", line 36, in validate_data
│   │       raise ValueError(f"This input is too long: {len(data)}.")
╰   ╰   ValueError: This input is too long: 100.

Exception handling and retries

Above stack trace is what you see if you don’t catch the exception. It is possible to add error handling around each remote Chainlet invocation.

Chains tries to raise the same exception class on the caller Chainlet as was raised in the dependency Chainlet.

  • Builtin exceptions (e.g. ValueError) always work.
  • Custom or third-party exceptions (e.g. from torch) can be only raised in the caller if they are included in the dependencies of the caller as well. If the exception class cannot be resolved, a GenericRemoteException is raised instead.

Note that the message of re-raised exceptions is the concatenation of the original message and the formatted stack trace of the dependency Chainlet.

In some cases it might make sense to simply retry a remote invocation (e.g. if it failed due to some transient problems like networking or any “flaky” parts). depends can be configured with additional options for that.

Below example shows how you can add automatic retries and error handling for the call to TextReplicator in TextToNum:

import truss_chains as chains


class TextToNum(chains.ChainletBase):

    def __init__(
        self,
        replicator: TextReplicator = chains.depends(TextReplicator, retries=3),
    ) -> None:
        self._replicator = replicator
    
    async def run_remote(self, data: ...):
        try:
            generated_text = await self._replicator.run_remote(data)
        except ValueError:
            ...  # Handle error.

Stack filtering

The stack trace is intended to show the user implemented code in run_remote (and user implemented helper functions). Under the hood, the calls from one Chainlet to another go through an HTTP connection, managed by the Chains framework. And each Chainlet itself is run as a FastAPI server with several layers of request handling code “above”.

In order to provide concise, readable stacks, all of this non-user code is filtered out.