Overview
Chains: A new DX for deploying multi-component ML workflows
Chains is a framework for building robust, performant multi-step and multi-model inference pipelines and deploying them to production. It addresses the common challenges of managing latency, cost and dependencies for complex workflows, while leveraging Truss’ existing battle-tested performance, reliability and developer toolkit.
User Guides
Guides focus on specific features and use cases. Also refer to getting started and general concepts.
Design
How to structure your Chainlets, concurrency, file structure
Local Dev
Iterating, Debugging, Testing, Mocking
Deploy
Deploy your Chain on Baseten
Invocation
Call your deployed Chain
Watch
Live-patch deployed code
Subclassing
Modularize and re-use Chainlet implementations
Streaming
Streaming outputs, reducing latency, SSEs
Binary IO
Performant serialization of numeric data
Error Propagation
Understanding and handling Chains errors
Truss Integration
Integrate deployed Truss models with stubs
From model to system
Some models are actually pipelines (e.g. invoking a LLM involves sequentially tokenizing the input, predicting the next token, and then decoding the predicted tokens). These pipelines generally make sense to bundle together in a monolithic deployment because they have the same dependencies, require the same compute resources, and have a robust ecosystem of tooling to improve efficiency and performance in a single deployment. Many other pipelines and systems do not share these properties. Some examples include:
- Running multiple different models in sequence.
- Chunking/partitioning a set of files and concatenating/organizing results.
- Pulling inputs from or saving outputs to a database or vector store.
Each step in these workflows has different hardware requirements, software dependencies, and scaling needs so it doesn’t make sense to bundle them in a monolithic deployment. That’s where Chains comes in!
Six principles behind Chains
Chains exists to help you build multi-step, multi-model pipelines. The abstractions that Chains introduces are based on six opinionated principles: three for architecture and three for developer experience.
Architecture principles
Atomic components
Each step in the pipeline can set its own hardware requirements and software dependencies, separating GPU and CPU workloads.
Modular scaling
Each component has independent autoscaling parameters for targeted resource allocation, removing bottlenecks from your pipelines.
Maximum composability
Components specify a single public interface for flexible-but-safe composition and are reusable between projects
Developer experience principles
Type safety and validation
Eliminate entire taxonomies of bugs by writing typed Python code and validating inputs, outputs, module initializations, function signatures, and even remote server configurations.
Local debugging
Seamless local testing and cloud deployments: test Chains locally with support for mocking the output of any step and simplify your cloud deployment loops by separating large model deployments from quick updates to glue code.
Incremental adoption
Use Chains to orchestrate existing model deployments, like pre-packaged models from Baseten’s model library, alongside new model pipelines built entirely within Chains.
Hello World with Chains
Here’s a simple Chain that says “hello” to each person in a list of provided names:
This is a toy example, but it shows how Chains can be used to separate preprocessing steps like chunking from workload execution steps. If SayHello were an LLM instead of a simple string template, we could do a much more complex action for each person on the list.
What to build with Chains
Get started by building and deploying your first chain.
Was this page helpful?