Chains is in beta mode. Read our launch blog post.

Chains is a framework for building robust, performant multi-step and multi-model inference pipelines and deploying them to production. It addresses the common challenges of managing latency, cost and dependencies for complex workflows, while leveraging Truss’ existing battle-tested performance, reliability and developer toolkit.

From model to system

Some models are actually pipelines (e.g. invoking a LLM involves sequentially tokenizing the input, predicting the next token, and then decoding the predicted tokens). These pipelines generally make sense to bundle together in a monolithic deployment because they have the same dependencies, require the same compute resources, and have a robust ecosystem of tooling to improve efficiency and performance in a single deployment. Many other pipelines and systems do not share these properties. Some examples include:

  • Running multiple different models in sequence.
  • Chunking/partitioning a set of files and concatenating/organizing results.
  • Pulling inputs from or saving outputs to a database or vector store.

Each step in these workflows has different hardware requirements, software dependencies, and scaling needs so it doesn’t make sense to bundle them in a monolithic deployment. That’s where Chains comes in!

Six principles behind Chains

Chains exists to help you build multi-step, multi-model pipelines. The abstractions that Chains introduces are based on six opinionated principles: three for architecture and three for developer experience.

Architecture principles

1

Atomic components

Each step in the pipeline can set its own hardware requirements and software dependencies, separating GPU and CPU workloads.

2

Modular scaling

Each component has independent autoscaling parameters for targeted resource allocation, removing bottlenecks from your pipelines.

3

Maximum composability

Components specify a single public interface for flexible-but-safe composition and are reusable between projects

Developer experience principles

1

Type safety and validation

Eliminate entire taxonomies of bugs by writing typed Python code and validating inputs, outputs, module initializations, function signatures, and even remote server configurations.

2

Local debugging

Seamless local testing and cloud deployments: test Chains locally with support for mocking the output of any step and simplify your cloud deployment loops by separating large model deployments from quick updates to glue code.

3

Incremental adoption

Use Chains to orchestrate existing model deployments, like pre-packaged models from Baseten’s model library, alongside new model pipelines built entirely within Chains.

Hello World with Chains

Here’s a simple Chain that says “hello” to each person in a list of provided names:

hello_chain/hello.py
import asyncio
import truss_chains as chains


# This Chainlet does the work.
class SayHello(chains.ChainletBase):

    async def run_remote(self, name: str) -> str:
        return f"Hello, {name}"


# This Chainlet orchestrates the work.
@chains.mark_entrypoint
class HelloAll(chains.ChainletBase):

    def __init__(self, say_hello_chainlet=chains.depends(SayHello)) -> None:
        self._say_hello = say_hello_chainlet

    async def run_remote(self, names: list[str]) -> str:
        tasks = []
        for name in names:
            tasks.append(asyncio.ensure_future(
                self._say_hello.run_remote(name)))
        
        return "\n".join(await asyncio.gather(*tasks))

This is a toy example, but it shows how Chains can be used to separate preprocessing steps like chunking from workload execution steps. If SayHello were an LLM instead of a simple string template, we could do a much more complex action for each person on the list.

What to build with Chains

Get started by building and deploying your first chain.