Chains is in beta mode. Read our launch blog post.

This quickstart guide contains instructions for creating two Chains:

  1. A simple CPU-only โ€œhello worldโ€-Chain.
  2. A Chain that implements Phi-3 Mini and uses it to write poems.

Prerequisites

To use Chains, install a recent Truss version and ensure pydantic is v2:

pip install --upgrade truss 'pydantic>=2.0.0'

To deploy Chains remotely, you also need a Baseten account. It is handy to export your API key to the current shell session or permanently in your .bashrc:

~/.bashrc
export BASETEN_API_KEY="nPh8..."

Example: Hello World

Chains are written in Python files. In your working directory, create hello_chain/hello.py:

mkdir hello_chain
cd hello_chain
touch hello.py

In the file, weโ€™ll specify a basic Chain. It has two Chainlets:

  • HelloWorld, the entrypoint, which handles the input and output.
  • RandInt, which generates a random integer. It is used a as a dependency by HelloWorld.

Via the entrypoint, the Chain takes a maximum value and returns the string โ€ Hello World!โ€ repeated a variable number of times.

hello.py
import random
import truss_chains as chains


class RandInt(chains.ChainletBase):
    def run_remote(self, max_value: int) -> int:
        return random.randint(1, max_value)


@chains.mark_entrypoint
class HelloWorld(chains.ChainletBase):
    def __init__(self, rand_int=chains.depends(RandInt, retries=3)) -> None:
        self._rand_int = rand_int

    def run_remote(self, max_value: int) -> str:
        num_repetitions = self._rand_int.run_remote(max_value)
        return "Hello World! " * num_repetitions

The Chainlet class-contract

Exactly one Chainlet must be marked as the entrypoint with the @chains.mark_entrypoint decorator. This Chainlet is responsible for handling public-facing input and output for the whole Chain in response to an API call.

A Chainlet class has a single public method, run_remote(), which is the API endpoint for the entrypoint Chainlet and the function that other Chainlets can use as a dependency. The run_remote() method must be fully type-annotated with or .

Chainlets cannot be instantiated. The only correct usages are:

  1. Make one Chainlet depend on another one via the chains.depends() directive as an __init__-argument as shown above for the RandInt Chainlet.
  2. In the local debugging mode.

Beyond that, you can structure your code as you like, with private methods, imports from other files, and so forth.

Keep in mind that Chainlets are intended for distributed, replicated, remote execution, so using global variables, global state, and certain Python features like importing modules dynamically at runtime should be avoided as they may not work as intended.

Deploy your Chain to Baseten

To deploy your Chain to Baseten, run:

truss chains deploy hello.py

The deploy command results in an output like this:

                  โ›“๏ธ   HelloWorld - Chainlets  โ›“๏ธ
โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ Status               โ”‚ Name                    โ”‚ Logs URL    โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚  ๐Ÿ’š ACTIVE           โ”‚ HelloWorld (entrypoint) โ”‚ https://... โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚  ๐Ÿ’š ACTIVE           โ”‚ RandInt (dep)           โ”‚ https://... โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ
Deployment succeeded.
You can run the chain with:
curl -X POST 'https://.../predict' \
    -H "Authorization: Api-Key $BASETEN_API_KEY" \
    -d '<JSON_INPUT>'

Wait for the status to turn to ACTIVE and test invoking your Chain (replace $PREDICT_URL in below command):

curl -X POST $PREDICT_URL \
  -H "Authorization: Api-Key $BASETEN_API_KEY" \
  -d '{"max_value": 10}'
# "Hello World! Hello World! Hello World! "

Example: Poetry with LLMs

Our second example also has two Chainlets, but is somewhat more complex and realistic. The Chainlets are:

  • PoemGenerator, the entrypoint, which handles the input and output and orchestrates calls to the LLM.
  • PhiLLM, which runs inference on Phi-3 Mini.

This Chain takes a list of words and returns a poem about each word, written by Phi-3. Hereโ€™s the architecture:

We build this Chain in a new working directory (if you are still inside hello_chain/, go up one level with cd .. first):

mkdir poetry_chain
cd poetry_chain
touch poems.py

A similar ent-to-end code example, using Mistral as an LLM, is available in the examples repo.

Building the LLM Chainlet

The main difference between this Chain and the previous one is that we now have an LLM that needs a GPU and more complex dependencies.

Copy the following code into poems.py:

poems.py
import asyncio
from typing import List

import pydantic
import truss_chains as chains
from truss import truss_config

PHI_HF_MODEL = "microsoft/Phi-3-mini-4k-instruct"
# This configures to cache model weights from the hunggingface repo
# in the docker image that is used for deploying the Chainlet.
PHI_CACHE = truss_config.ModelRepo(
    repo_id=PHI_HF_MODEL, allow_patterns=["*.json", "*.safetensors", ".model"]
)


class Messages(pydantic.BaseModel):
    messages: List[dict[str, str]]


class PhiLLM(chains.ChainletBase):
    # `remote_config` defines the resources required for this chainlet.
    remote_config = chains.RemoteConfig(
        docker_image=chains.DockerImage(
            # The phi model needs some extra python packages.
            pip_requirements=[
                "accelerate==0.30.1",
                "einops==0.8.0",
                "transformers==4.41.2",
                "torch==2.3.0",
            ]
        ),
        # The phi model needs a GPU and more CPUs.
        compute=chains.Compute(cpu_count=2, gpu="T4"),
        # Cache the model weights in the image
        assets=chains.Assets(cached=[PHI_CACHE]),
    )

    def __init__(self) -> None:
        # Note the imports of the *specific* python requirements are
        # pushed down to here. This code will only be executed on the
        # remotely deployed Chainlet, not in the local environment,
        # so we don't need to install these packages in the local
        # dev environment.
        import torch
        import transformers

        self._model = transformers.AutoModelForCausalLM.from_pretrained(
            PHI_HF_MODEL,
            torch_dtype=torch.float16,
            device_map="auto",
        )
        self._tokenizer = transformers.AutoTokenizer.from_pretrained(
            PHI_HF_MODEL,
        )
        self._generate_args = {
            "max_new_tokens": 512,
            "temperature": 1.0,
            "top_p": 0.95,
            "top_k": 50,
            "repetition_penalty": 1.0,
            "no_repeat_ngram_size": 0,
            "use_cache": True,
            "do_sample": True,
            "eos_token_id": self._tokenizer.eos_token_id,
            "pad_token_id": self._tokenizer.pad_token_id,
        }

    async def run_remote(self, messages: Messages) -> str:
        import torch

        model_inputs = self._tokenizer.apply_chat_template(
            messages, tokenize=False, add_generation_prompt=True
        )
        inputs = self._tokenizer(model_inputs, return_tensors="pt")
        input_ids = inputs["input_ids"].to("cuda")
        with torch.no_grad():
            outputs = self._model.generate(input_ids=input_ids, **self._generate_args)
            output_text = self._tokenizer.decode(outputs[0], skip_special_tokens=True)
        return output_text

Building the entrypoint

Now that we have an LLM, we can use it in a poem generator Chainlet. Add the following code to poems.py:

poems.py
@chains.mark_entrypoint
class PoemGenerator(chains.ChainletBase):
    def __init__(self, phi_llm: PhiLLM = chains.depends(PhiLLM)) -> None:
        self._phi_llm = phi_llm

    async def run_remote(self, words: list[str]) -> list[str]:
        tasks = []
        for word in words:
            messages = Messages(
                messages=[
                    {
                        "role": "system",
                        "content": (
                            "You are poet who writes short, "
                            "lighthearted, amusing poetry."
                        ),
                    },
                    {"role": "user", "content": f"Write a poem about {word}"},
                ]
            )
            tasks.append(asyncio.ensure_future(self._phi_llm.run_remote(messages)))
        return list(await asyncio.gather(*tasks))

Note that we use asyncio.ensure_future around each RPC to the LLM chainlet. This makes the current python process start these remote calls concurrently, i.e. the next call is started before the previous one has finished and we can minimize our overall runtime. In order to await the results of all calls, asyncio.gather is used which gives us back normal python objects. If the LLM is hit with many concurrent requests, it can auto-scale up (if autoscaling is configure). More advanced LLM models have batching capabilities, so for those even a single instance can serve concurrent request.

Deploy your Chain to Baseten

To deploy your Chain to Baseten, run:

truss chains deploy poems.py

Wait for the status to turn to ACTIVE and test invoking your Chain (replace $PREDICT_URL in below command):

curl -X POST $PREDICT_URL \
    -H "Authorization: Api-Key $BASETEN_API_KEY" \
    -d '{"words": ["bird", "plane", "superman"]}'
#[[
#"<s> [INST] Generate a poem about: bird [/INST] In the quiet hush of...</s>",
#"<s> [INST] Generate a poem about: plane [/INST] In the vast, boudl...</s>",
#"<s> [INST] Generate a poem about: superman [/INST] In the realm where...</s>"
#]]