User Guide
Core developer loops with Chains
Designing the architecture of a Chain
A Chain is composed of multiple connecting Chainlets working together to perform a task.
For example, the Chain in the diagram below takes a large audio file. Then it splits it into smaller chunks, transcribes each chunk in parallel to speed up the transcription process, and finally aggregates and returns the results.
To build an efficient end-to-end Chain, we recommend drafting your high level structure as a flowchart or diagram. This will help you identify the Chainlets needed and how to link them.
If one Chainlet creates many “sub-tasks” by calling other dependency
Chainlets (e.g. in a loop over partial work items),
these calls should be done as aynscio
-tasks that run concurrently.
That way you get the most out of the parallelism that Chains offers. This
design pattern is extensively used in the
audio transcription example.
Local development
Chains are designed for production in replicated remote deployments. But alongside that production-ready power, we need great local development and deployment experiences.
Locally, a Chain is just Python files in a source tree. While that gives you a lot of flexibility in how you structure your code, there are some constraints and rules to follow to ensure successful distributed, remote execution in production.
The best thing you can do while developing locally with Chains is torun your
code frequently, even if you do not have a __main__
section: the Chains
framework runs various validations at
to help
you catch issues early.
Additionally, running mypy
and fixing reported type errors can help you
find problems early and in a rapid feedback loop, before attempting a (much
slower) deployment.
Complementary to the purely local development Chains also has a “watch” mode, like Truss, see the watch section below.
Test a Chain locally
Let’s revisit our “Hello World” Chain:
import asyncio
import truss_chains as chains
# This Chainlet does the work
class SayHello(chains.ChainletBase):
async def run_remote(self, name: str) -> str:
return f"Hello, {name}"
# This Chainlet orchestrates the work
@chains.mark_entrypoint
class HelloAll(chains.ChainletBase):
def __init__(self, say_hello_chainlet=chains.depends(SayHello)) -> None:
self._say_hello = say_hello_chainlet
async def run_remote(self, names: list[str]) -> str:
tasks = []
for name in names:
tasks.append(asyncio.ensure_future(
self._say_hello.run_remote(name)))
return "\n".join(await asyncio.gather(*tasks))
# Test the Chain locally
if __name__ == "__main__":
with chains.run_local():
hello_chain = HelloAll()
result = asyncio.get_event_loop().run_until_complete(
hello_chain.run_remote(["Marius", "Sid", "Bola"]))
print(result)
When the __main__()
module is run, local instances of the Chainlets are
created, allowing you to test functionality of your chain just by executing the
Python file:
cd hello_chain
python hello.py
# Hello, Marius
# Hello, Sid
# Hello, Bola
Mock execution of GPU Chainlets
Using run_local()
to run your code locally requires that your development
environment have the compute resources and dependencies that each Chainlet
needs. But that often isn’t possible when building with AI models.
Chains offers a workaround, mocking, to let you test the coordination and business logic of your multi-step inference pipeline without worrying about running the model locally.
The second example in the getting started guide implements a Truss Chain for generating poems with Phi-3.
This Chain has two Chainlets:
- The
PhiLLM
Chainlet, which requires an NVIDIA A10G GPU. - The
PoemGenerator
Chainlet, which easily runs on a CPU.
If you have an NVIDIA T4 under your desk, good for you. For the rest of us, we
can mock the PhiLLM
Chainlet that is infeasible to run locally so that we can
quickly test the PoemGenerator
Chainlet.
To do this, we define a mock Phi-3 model in our __main__
module and give it
a run_remote()
method that
produces a test output that matches the output type we expect from the real
Chainlet. Then, we inject an instance of this mock Chainlet into our Chain:
if __name__ == "__main__":
class FakePhiLLM:
def run_remote(self, prompt: str) -> str:
return f"Here's a poem about {prompt.split(" ")[-1]}"
with chains.run_local():
poem_generator = PoemGenerator(phi_llm=FakePhiLLM())
result = poem_generator.run_remote(words=["bird", "plane", "superman"])
print(result)
And run your Python file:
python poems.py
# ['Here's a poem about bird', 'Here's a poem about plane', 'Here's a poem about superman']
You may notice that the argument phi_llm
expects a type PhiLLM
, while we are passing it an instance of FakePhiLLM
. These aren’t the same, which should be a type error.
However, this works at runtime because we constructed FakePhiLLM
to use the
same protocol as the real thing. We can make this explicit by defining
a Protocol
as a type annotation:
from typing import Protocol
class PhiProtocol(Protocol):
def run_remote(self, data: str) -> str:
...
and changing the argument type in PoemGenerator
:
@chains.mark_entrypoint
class PoemGenerator(chains.ChainletBase):
def __init__(self, phi_llm: PhiProtocol = chains.depends(PhiLLM)) -> None:
self._phi_llm = phi_llm
This resolves the apparent type error.
Chains Watch ✨ [new]
The watch command (truss chains watch
) combines
the best of local development and full deployment. watch
lets you run on an
exact copy of the production hardware and interface but gives you live reload
that lets you test changes in seconds without creating a new deployment.
To use truss chains watch
:
- Push a chain in development mode (i.e.
publish
andpromote
flags are false). - Run the watch command
truss chains watch SOURCE
. You can also add thewatch
option to thepush
command and combine both to a single step. - Each time you edit a file and save the changes, the watcher patches the remote deployments. Updating the deployments might take a moment, but is generally much faster than creating a new deployment.
- You can call the chain with test data via
cURL
or the call dialogue in the UI and observe the result and logs. - Iterate steps 3. and 4. until your chain behaves in the desired way.
Deploy your Chains
Deploying a Chain to production is an atomic action that deploys every Chainlet within the chain separately. Each Chainlet specifies its own remote environment — hardware resources, Python and system dependencies, autoscaling settings.
Development deployment
To deploy a Chain as a development deployment, run:
truss chains push ./my_chain.py
Where my_chain.py
contains the entrypoint Chainlet for your Chain.
Development deployments are intended for testing and can’t scale past one replica. Each time you make a development deployment, it overwrites the existing development deployment.
Production deployment
To deploy a Chain as a production deployment, run:
truss chains push ./my_chain.py
Production deployments are intended for live traffic and have access to full autoscaling settings. Each time you deploy to production, a new deployment is created. Once the new deployment is live, it replaces the previous production deployment, which is relegated to the published deployments list.
Call a Chain’s API endpoint
Once your Chain is deployed, you can call it via its API endpoint. Chains use the same inference API as models:
Here’s an example which calls the development deployment:
import requests
import os
# From the Chain overview page on Baseten
# E.g. "https://chain-<MODEL_ID>.api.baseten.co/development/run_remote"
CHAIN_URL = ""
baseten_api_key = os.environ["BASETEN_API_KEY"]
# JSON keys and types match the `run_remote` method signature.
data = {...}
resp = requests.post(
CHAIN_URL,
headers={"Authorization": f"Api-Key {baseten_api_key}"},
json=data,
)
print(resp.json())
How to pass chain input
The data schema of the inference request corresponds to the function
signature of run_remote()
in your entrypoint Chainlet.
For example, for the Hello Chain, HelloAll.run_remote()
:
def run_remote(self, names: list[str]) -> str:
You’d pass the following JSON payload:
{"names": ["Marius", "Sid", "Bola"]}
I.e. the keys in the JSON record, match the argument names and types of
run_remote.