Local Development
Iterating, Debugging, Testing, Mocking
Chains are designed for production in replicated remote deployments. But alongside that production-ready power, we offer great local development and deployment experiences.
The 6 principles behind Chains
The 6 principles behind Chains
Chains exists to help you build multi-step, multi-model pipelines. The abstractions that Chains introduces are based on six opinionated principles: three for architecture and three for developer experience.
Architecture principles
Atomic components
Each step in the pipeline can set its own hardware requirements and software dependencies, separating GPU and CPU workloads.
Modular scaling
Each component has independent autoscaling parameters for targeted resource allocation, removing bottlenecks from your pipelines.
Maximum composability
Components specify a single public interface for flexible-but-safe composition and are reusable between projects
Developer experience principles
Type safety and validation
Eliminate entire taxonomies of bugs by writing typed Python code and validating inputs, outputs, module initializations, function signatures, and even remote server configurations.
Local debugging
Seamless local testing and cloud deployments: test Chains locally with support for mocking the output of any step and simplify your cloud deployment loops by separating large model deployments from quick updates to glue code.
Incremental adoption
Use Chains to orchestrate existing model deployments, like pre-packaged models from Baseten’s model library, alongside new model pipelines built entirely within Chains.
Locally, a Chain is just Python files in a source tree. While that gives you a lot of flexibility in how you structure your code, there are some constraints and rules to follow to ensure successful distributed, remote execution in production.
The best thing you can do while developing locally with Chains is to run your
code frequently, even if you do not have a __main__
section: the Chains
framework runs various validations at
to help
you catch issues early.
Additionally, running mypy
and fixing reported type errors can help you
find problems early in a rapid feedback loop, before attempting a (much
slower) deployment.
Complementary to the purely local development Chains also has a “watch” mode, like Truss, see the watch guide.
Test a Chain locally
Let’s revisit our “Hello World” Chain:
When the __main__()
module is run, local instances of the Chainlets are
created, allowing you to test functionality of your chain just by executing the
Python file:
Mock execution of GPU Chainlets
Using run_local()
to run your code locally requires that your development
environment have the compute resources and dependencies that each Chainlet
needs. But that often isn’t possible when building with AI models.
Chains offers a workaround, mocking, to let you test the coordination and business logic of your multi-step inference pipeline without worrying about running the model locally.
The second example in the getting started guide implements a Truss Chain for generating poems with Phi-3.
This Chain has two Chainlets:
- The
PhiLLM
Chainlet, which requires an NVIDIA A10G GPU. - The
PoemGenerator
Chainlet, which easily runs on a CPU.
If you have an NVIDIA T4 under your desk, good for you. For the rest of us, we
can mock the PhiLLM
Chainlet that is infeasible to run locally so that we can
quickly test the PoemGenerator
Chainlet.
To do this, we define a mock Phi-3 model in our __main__
module and give it
a run_remote()
method that
produces a test output that matches the output type we expect from the real
Chainlet. Then, we inject an instance of this mock Chainlet into our Chain:
And run your Python file:
Typing of mocks
You may notice that the argument phi_llm
expects a type PhiLLM
, while we
pass an instance of FakePhiLLM
. These aren’t the same, which is formally a
type error.
However, this works at runtime because we constructed FakePhiLLM
to
implement the same protocol as the real thing. We can make this explicit by
defining a Protocol
as a type annotation:
and changing the argument type in PoemGenerator
:
This is a bit more work and not needed to execute the code, but it shows how typing consistency can be achieved - if desired.