Chains is in beta mode. Read our launch blog post.

Designing the architecture of a Chain

A Chain is composed of multiple connecting Chainlets working together to perform a task.

For example, the Chain in the diagram below takes a large audio file. Then it splits it into smaller chunks, transcribes each chunk in parallel to speed up the transcription process, and finally aggregates and returns the results.

To build an efficient end-to-end Chain, we recommend drafting your high level structure as a flowchart or diagram. This will help you identify the Chainlets needed and how to link them.

If one Chainlet creates many “sub-tasks” by calling other dependency Chainlets (e.g. in a loop over partial work items), these calls should be done as aynscio-tasks that run concurrently. That way you get the most out of the parallelism that Chains offers. This design pattern is extensively used in the audio transcription example.

Local development

Chains are designed for production in replicated remote deployments. But alongside that production-ready power, we need great local development and deployment experiences.

Locally, a Chain is just Python files in a source tree. While that gives you a lot of flexibility in how you structure your code, there are some constraints and rules to follow to ensure successful distributed, remote execution in production.

The best thing you can do while developing locally with Chains is torun your code frequently, even if you do not have a __main__ section: the Chains framework runs various validations at to help you catch issues early.

Additionally, running mypy and fixing reported type errors can help you find problems early and in a rapid feedback loop, before attempting a (much slower) deployment.

Test a Chain locally

Let’s revisit our “Hello World” Chain:

import asyncio
import truss_chains as chains

# This Chainlet does the work
class SayHello(chains.ChainletBase):

    async def run_remote(self, name: str) -> str:
        return f"Hello, {name}"

# This Chainlet orchestrates the work
class HelloAll(chains.ChainletBase):

    def __init__(self, say_hello_chainlet=chains.depends(SayHello)) -> None:
        self._say_hello = say_hello_chainlet

    async def run_remote(self, names: list[str]) -> str:
        tasks = []
        for name in names:
        return "\n".join(await asyncio.gather(*tasks))

# Test the Chain locally
if __name__ == "__main__":
    with chains.run_local():
        hello_chain = HelloAll()
        result = asyncio.get_event_loop().run_until_complete(
            hello_chain.run_remote(["Marius", "Sid", "Bola"]))

When the __main__() module is run, local instances of the Chainlets are created, allowing you to test functionality of your chain just by executing the Python file:

cd hello_chain
# Hello, Marius
# Hello, Sid
# Hello, Bola

Mock execution of GPU Chainlets

Using run_local() to run your code locally requires that your development environment have the compute resources and dependencies that each Chainlet needs. But that often isn’t possible when building with AI models.

Chains offers a workaround, mocking, to let you test the coordination and business logic of your multi-step inference pipeline without worrying about running the model locally.

The second example in the getting started guide implements a Truss Chain for generating poems with Phi-3.

This Chain has two Chainlets:

  1. The PhiLLM Chainlet, which requires an NVIDIA A10G GPU.
  2. The PoemGenerator Chainlet, which easily runs on a CPU.

If you have an NVIDIA T4 under your desk, good for you. For the rest of us, we can mock the PhiLLM Chainlet that is infeasible to run locally so that we can quickly test the PoemGenerator Chainlet.

To do this, we define a mock Phi-3 model in our __main__ module and give it a run_remote() method that produces a test output that matches the output type we expect from the real Chainlet. Then, we inject an instance of this mock Chainlet into our Chain:
if __name__ == "__main__":
    class FakePhiLLM:
        def run_remote(self, prompt: str) -> str:
            return f"Here's a poem about {prompt.split(" ")[-1]}"

    with chains.run_local():
        poem_generator = PoemGenerator(phi_llm=FakePhiLLM())
        result = poem_generator.run_remote(words=["bird", "plane", "superman"])

And run your Python file:

# ['Here's a poem about bird', 'Here's a poem about plane', 'Here's a poem about superman']

You may notice that the argument phi_llm expects a type PhiLLM, while we are passing it an instance of FakePhiLLM. These aren’t the same, which should be a type error.

However, this works at runtime because we constructed FakePhiLLM to use the same protocol as the real thing. We can make this explicit by defining a Protocol as a type annotation:

from typing import Protocol

class PhiProtocol(Protocol):
    def run_remote(self, data: str) -> str:

and changing the argument type in PoemGenerator:

class PoemGenerator(chains.ChainletBase):
    def __init__(self, phi_llm: PhiProtocol = chains.depends(PhiLLM)) -> None:
        self._phi_llm = phi_llm

This resolves the apparent type error.

Deploy your Chains

Deploying a Chain to production is an atomic action that deploys every Chainlet within the chain separately. Each Chainlet specifies its own remote environment — hardware resources, Python and system dependencies, autoscaling settings.

Development deployment

To deploy a Chain as a development deployment, run:

truss chains deploy ./

Where contains the entrypoint Chainlet for your Chain.

Development deployments are intended for testing and can’t scale past one replica. Each time you make a development deployment, it overwrites the existing development deployment.

Production deployment

To deploy a Chain as a production deployment, run:

truss chains deploy ./

Production deployments are intended for live traffic and have access to full autoscaling settings. Each time you deploy to production, a new deployment is created. Once the new deployment is live, it replaces the previous production deployment, which is relegated to the published deployments list.

Call a Chain’s API endpoint

Once your Chain is deployed, you can call it via its API endpoint. Chains use the same inference API as models:

Here’s an example which calls the development deployment:
import requests
import os

# From the Chain overview page on Baseten
# E.g. "https://model-<MODEL_ID>"
CHAIN_URL = ""  
baseten_api_key = os.environ["BASETEN_API_KEY"]
# JSON keys and types match the `run_remote` method signature.
data = {...}

resp =
    headers={"Authorization": f"Api-Key {baseten_api_key}"},


How to pass chain input

The data schema of the inference request corresponds to the function signature of run_remote() in your entrypoint Chainlet.

For example, for the Hello Chain, HelloAll.run_remote():

def run_remote(self, names: list[str]) -> str:

You’d pass the following JSON payload:

{"names": ["Marius", "Sid", "Bola"]}

I.e. the keys in the JSON record, match the argument names and types of run_remote.