> ## Documentation Index
> Fetch the complete documentation index at: https://docs.baseten.co/llms.txt
> Use this file to discover all available pages before exploring further.

# Loops SDK

> Python client for Loops: ServiceClient, TrainingClient, SamplingClient, and the Tinker compatibility shim.

The Loops Python SDK exposes three classes used in training scripts. `ServiceClient` provisions trainer and sampling servers on the Baseten control plane and manages the session that ties them together. `TrainingClient` runs forward passes, backward passes, and optimizer steps against a live trainer server. `SamplingClient` generates completions from the current weights, and the version pinning it carries means you always sample from exactly the checkpoint you trained. These three classes mirror the Tinker shapes; the methods you call are the same names.

## Installation

The main client package is `baseten-loops`. The Tinker compatibility shim ships as the `[tinker]` extra, which pulls in the `baseten-loops-tinker` wheel and provides the `tinker` namespace. From a uv project (see the [quickstart](/loops/quickstart#install) for project setup), install both with the extra:

```bash theme={"system"}
uv add 'baseten-loops[tinker]'
```

```python theme={"system"}
from baseten.loops import ServiceClient, TrainingClient, SamplingClient

# Tinker-compatible namespace (provided by baseten-loops[tinker])
import tinker
```

## Minimal example

One LoRA round trip: provision a trainer, run a forward and backward pass over a single masked prompt-and-answer pair, step the optimizer, publish the weights, and sample from them.

```python theme={"system"}
from baseten.loops import (
    ServiceClient, Datum, ModelInput, TensorData, AdamParams, SamplingParams,
)

service_client = ServiceClient()
training_client = service_client.create_lora_training_client(
    base_model="Qwen/Qwen3.5-2B", rank=16,
)

# Tokenize one prompt/answer pair and mask the prompt positions from the loss.
tokenizer = training_client.get_tokenizer()
prompt = tokenizer.encode("What is the capital of France?\nAnswer:", add_special_tokens=False)
answer = tokenizer.encode(" Paris", add_special_tokens=False)
tokens = prompt + answer
targets = [-100] * len(prompt) + answer

datum = Datum(
    model_input=ModelInput.from_ints(tokens),
    loss_fn_inputs={"target_tokens": TensorData(data=targets, dtype="int64", shape=[len(targets)])},
)

training_client.forward_backward(data=[datum]).result(timeout=600.0)
training_client.optim_step(AdamParams(learning_rate=4e-5)).result(timeout=600.0)
sampling_client = training_client.save_weights_and_get_sampling_client(name="step-1").result(timeout=600.0)

result = sampling_client.sample(
    prompt=ModelInput.from_ints(prompt), num_samples=1, sampling_params=SamplingParams(max_tokens=16),
)
print(result.sequences[0].tokens)
```

Each long-running call is submit-then-`.result()`: the submit validates and returns immediately, and `.result()` long-polls until the operation finishes. Provisioning the trainer can take several minutes on a fresh base model, so the first call blocks the longest. The `sampling_client` returned by `save_weights_and_get_sampling_client` is pinned to the version you just trained, so the sample reflects this step's weights. For the end-to-end walkthrough with expected output, see the [Loops quickstart](/loops/quickstart).

## ServiceClient: provision

`ServiceClient` is the entry point for every session. It calls the Baseten control plane to create a `TrainerSession`, then provisions trainer and sampling servers within that session on demand.

<ParamField body="ServiceClient(training_project_id=None, *, api_key=None, base_url=None, reuse_from_session_id=None)" type="ServiceClient">
  Construct a `ServiceClient` and create a new `TrainerSession` on the Baseten control plane. Omit `training_project_id` to use the default project for the org, or pass one to target a specific training project. `api_key` defaults to the `BASETEN_API_KEY` environment variable.

  Pass `reuse_from_session_id` to reuse a prior session's trainer and sampler for `create_lora_training_client` and `create_sampling_client` calls instead of provisioning fresh. The named session must belong to the same team. `ServiceClient` reads the `LOOPS_REUSE_FROM_SESSION_ID` environment variable when no kwarg is passed; the kwarg wins when both are set. Reuse is best-effort: if the prior deployment is stopped, failed, or unhealthy, a fresh one is provisioned and the call still succeeds. See [Reusing infrastructure across sessions](/loops/concepts#reusing-infrastructure-across-sessions).
</ParamField>

<ParamField body="ServiceClient.local(*, trainer_url, sampler_url)" type="ServiceClient">
  Bind to already-running local trainer and sampler processes without contacting the control plane. Pass `trainer_url` and `sampler_url` as the base URLs of local server processes. Useful for end-to-end testing.
</ParamField>

<ParamField body="create_lora_training_client(base_model, rank=32, seed=None, timeout=600.0, ready_timeout=3600.0, wandb_config=None)" type="TrainingClient">
  Provision a `TrainerServer` for the given Hugging Face `base_model` and return a connected `TrainingClient`. The control plane also provisions a paired sampling server in the same call; `save_weights_and_get_sampling_client` uses that paired URL to gate on version readiness. Pass a `WandbConfig` instance to stream training metrics to a Weights & Biases run.
</ParamField>

<ParamField body="create_sampling_client(base_model, timeout=300.0, ready_timeout=3600.0, model_path=None)" type="SamplingClient">
  Provision a standalone `SamplingServer` for `base_model` and return a connected `SamplingClient`. Use this when you want to sample from a base model independently of a training run. The `model_path` argument is reserved and not yet implemented; passing it raises `NotImplementedError`. To sample from a specific checkpoint, use `TrainingClient.create_sampling_client(model_path=...)` on a live run instead.
</ParamField>

<ParamField body="get_server_capabilities()" type="ServerCapabilities">
  Return the control plane's view of supported base models and the GPU classes it can provision them on. Useful for confirming a base model is available before calling `create_lora_training_client`.
</ParamField>

<ParamField body="list_checkpoints(run_id)" type="list[Checkpoint]">
  List checkpoints saved by the run identified by `run_id`. Calls the Baseten API, not the trainer server directly.
</ParamField>

<ParamField body="get_checkpoint_archive_url(checkpoint_id, page_size=1000, page_token=0)" type="CheckpointFilesResponse">
  Return presigned URLs for every file in the specified checkpoint folder. Checkpoint IDs are globally unique, so no run scoping is required. The Loops stack writes checkpoints as unzipped directories rather than archives, so this returns a file list instead of a single archive URL.
</ParamField>

<ParamField body="session_id" type="str">
  Property. The session ID assigned by the control plane. Available after construction.
</ParamField>

## TrainingClient: train

`TrainingClient` talks directly to a `dp_worker` instance. Long-running operations use a submit-and-retrieve protocol: the submit fires immediately on the calling thread (so validation errors surface at call time) and `.result()` long-polls the server until the operation finishes. You can submit multiple operations before awaiting any of them.

<ParamField body="forward_backward(data, loss_fn=&#x22;cross_entropy&#x22;, loss_fn_config=None)" type="ForwardBackwardFuture">
  Run a forward and backward pass over `data` (a list of `Datum` objects) using the specified loss function. Returns a `ForwardBackwardFuture`; call `.result()` to block until the pass completes and retrieve the loss.
</ParamField>

<ParamField body="forward(data, loss_fn=&#x22;cross_entropy&#x22;, loss_fn_config=None)" type="ForwardBackwardFuture">
  Run a forward pass without gradient computation. Same inputs and output shape as `forward_backward`, but the gradient buffer is left untouched, so it is safe to interleave with gradient accumulation steps.
</ParamField>

<ParamField body="optim_step(adam_params)" type="OperationFuture[OptimStepResponse]">
  Apply the accumulated gradients using the Adam optimizer configured by `adam_params`. Call this after one or more `forward_backward` calls.
</ParamField>

<ParamField body="save_state(name, ttl_seconds=None)" type="OperationFuture[SaveWeightsResponse]">
  Persist a local training checkpoint under `name`. When a weight sync URI is configured server-side, `save_state` also publishes the LoRA adapter so a polling sampler can hot-swap to the new weights.
</ParamField>

<ParamField body="save_weights_for_sampler(name, ttl_seconds=None)" type="OperationFuture[SaveWeightsResponse]">
  Publish the LoRA adapter to the paired sampling server under `name` without returning a snapshot-pinned `SamplingClient`. Use this when you don't need the version gate that `save_weights_and_get_sampling_client` provides.
</ParamField>

<ParamField body="save_weights_and_get_sampling_client(name)" type="_ComposedFuture[SamplingClient]">
  Publish the LoRA adapter to the paired sampling server under `name` and return a future that resolves to a `SamplingClient` pinned to the newly published version. Calling `.result()` runs two stages: the trainer publishes weights, then the SDK polls the sampler until at least one replica reports the new version loaded. The sampler-wait phase has a fixed 600-second ceiling independent of the `timeout=` you pass to `.result()`; if no replica reports the new version by then, the call raises `RuntimeError`. The returned `SamplingClient` carries `X-Min-Policy-Version` on every subsequent `sample()` call, so requests only land on replicas that have the right weights.
</ParamField>

<ParamField body="load_state(path)" type="OperationFuture[LoadWeightsResponse]">
  Load weights from a `bt://loops:<run_id>/weights/<checkpoint>` URI into this trainer. Use to resume training from a checkpoint.
</ParamField>

<ParamField body="load_state_with_optimizer(path)" type="OperationFuture[LoadWeightsResponse]">
  Same as `load_state` but also restores Adam moments. Use when you want bit-exact resumption.
</ParamField>

<ParamField body="list_checkpoints()" type="list[Checkpoint]">
  List checkpoints for the run bound to this client. Requires that this client was constructed via `ServiceClient.create_lora_training_client` (which populates the necessary session and run IDs automatically).
</ParamField>

<ParamField body="get_checkpoint_archive_url(checkpoint_id, page_size=1000, page_token=0)" type="CheckpointFilesResponse">
  Return presigned URLs for every file in a checkpoint folder. Same semantics as `ServiceClient.get_checkpoint_archive_url`.
</ParamField>

<ParamField body="create_sampling_client(model_path)" type="SamplingClient">
  Return a `SamplingClient` bound to the paired sampler, loading the weights at `model_path` (a `bt://loops:<run_id>/sampler_weights/<checkpoint>` URI). Distinct from `ServiceClient.create_sampling_client`, which provisions a fresh sampler.
</ParamField>

<ParamField body="get_tokenizer()" type="PreTrainedTokenizer">
  Return the Hugging Face `PreTrainedTokenizer` for the base model. Cached after the first load.
</ParamField>

<ParamField body="get_info()" type="GetInfoResponse">
  Return the model configuration for this training session (base model name, LoRA rank, and max sequence length) without a server round-trip.
</ParamField>

<ParamField body="run_id" type="str | None">
  Property. The run ID this client is bound to. Use this when filtering checkpoints or making HTTP API calls against the same run.
</ParamField>

<ParamField body="policy_version" type="int">
  Property. The current policy version the trainer has published. Incremented on each `save_weights_and_get_sampling_client` (or `save_weights_for_sampler`) call.
</ParamField>

## SamplingClient: sample

`SamplingClient` generates text completions from the model the sampler currently has loaded. There are two creation paths with different version semantics: `ServiceClient.create_sampling_client` returns an auto-updating client that follows whatever weights the sampler currently holds, while `TrainingClient.save_weights_and_get_sampling_client` returns a snapshot client pinned to the trained version. Both clients expose the same `sample` method.

<ParamField body="sample(prompt, num_samples=1, sampling_params=None, include_prompt_logprobs=False, topk_prompt_logprobs=0)" type="SampleResult">
  Generate `num_samples` completions from `prompt` (a `ModelInput`). Pass a `SamplingParams` instance to control temperature, top-p, top-k, max tokens, seed, and stop sequences; omit it to use defaults. Set `include_prompt_logprobs=True` to get per-token log-probabilities for the input tokens alongside the output, and set `topk_prompt_logprobs` above `0` to also return the top-k alternatives at each prompt position. The sampler resolves which adapter or base model to serve from the version headers the client carries, so there is no per-call model override.
</ParamField>

<ParamField body="compute_logprobs(prompt)" type="list[float | None]">
  Return the per-token log-probabilities for `prompt` without generating any new tokens. Index 0 is always `None` because the first token has no preceding context to score against. Other positions may also be `None` if the sampler can't compute a log-probability for that token.
</ParamField>

<ParamField body="discover_base_model_name()" type="str">
  Return the base model ID from the sampler's `/v1/models` list, specifically the entry with no parent. Retries with backoff while the sampler is still deploying.
</ParamField>

<ParamField body="discover_adapter_name()" type="str | None">
  Return the currently registered LoRA adapter ID (the first `/v1/models` entry with a non-null parent), or `None` if no adapter is loaded.
</ParamField>

<ParamField body="get_base_model()" type="str">
  Return the base model ID this sampling client was created with, without contacting the server.
</ParamField>

<ParamField body="get_tokenizer()" type="PreTrainedTokenizer">
  Return the Hugging Face `PreTrainedTokenizer` for the base model this client was created with.
</ParamField>

## Types

### Training inputs

<ParamField body="Datum">
  A single training example: a `ModelInput` paired with a dict of `TensorData` loss function inputs.
</ParamField>

<ParamField body="ModelInput">
  A tokenized prompt, represented as a list of `ModelInputChunk` objects. Construct with `ModelInput.from_ints(token_ids)` for the common case.
</ParamField>

<ParamField body="ModelInputChunk">
  A discriminated union of `EncodedTextChunk` (a list of token IDs) and `ImageChunk` (a base64-encoded image with an expected token count).
</ParamField>

<ParamField body="TensorData">
  A serializable tensor with a flat data list, a dtype string, and a shape. Convert to and from `torch.Tensor` with `TensorData.to_torch()` and `TensorData.from_torch(tensor)`.
</ParamField>

### Configuration

<ParamField body="SamplingParams">
  Controls for text generation: `temperature`, `top_p`, `top_k`, `max_tokens`, `seed`, and `stop`.
</ParamField>

<ParamField body="AdamParams">
  Optimizer hyperparameters: `learning_rate`, `beta1`, `beta2`, `eps`, `weight_decay`, and `grad_clip_norm`.
</ParamField>

<ParamField body="WandbConfig">
  Optional Weights & Biases settings (`project` and an optional run `name`) passed to `create_lora_training_client` to stream training metrics.
</ParamField>

### Results and handles

<ParamField body="SampleResult">
  The full response from `sample()`: a list of `SampledSequence` objects in `sequences`, the `policy_version` the sampler replica was running, and `prompt_logprobs` / `topk_prompt_logprobs` populated when the matching `sample()` flags are set.
</ParamField>

<ParamField body="SampledSequence">
  A single generated sequence: a list of output token IDs, optional per-token log-probabilities, and a stop reason.
</ParamField>

<ParamField body="Checkpoint">
  Metadata for a saved checkpoint, populated by `list_checkpoints()`.
</ParamField>

<ParamField body="CheckpointFilesResponse">
  A paginated list of presigned file URLs for a checkpoint, populated by `get_checkpoint_archive_url()`.
</ParamField>

<ParamField body="CheckpointFile">
  One entry in a `CheckpointFilesResponse.presigned_urls` list: a presigned URL plus `relative_file_name`, `node_rank`, `size_bytes`, and `last_modified` metadata.
</ParamField>

<ParamField body="ServerCapabilities, SupportedModel">
  Returned by `ServiceClient.get_server_capabilities()`; describe which base models the control plane can provision and on which GPU classes.
</ParamField>

<ParamField body="OperationFuture[T]">
  A handle to a long-running training operation. Call `.result()` or `.result(timeout=seconds)` to block until the operation completes and return the result. The `forward` and `forward_backward` methods return a `ForwardBackwardFuture` subclass, and `save_weights_and_get_sampling_client` returns a composed future; both expose the same `.result()` contract.
</ParamField>

<ParamField body="ForwardBackwardOutput, OptimStepResponse, SaveWeightsResponse, SaveWeightsForSamplerResponse, LoadWeightsResponse, InitTrainerServerResponse, SampleResponse">
  Response payloads returned by the matching `TrainingClient` and `SamplingClient` methods.
</ParamField>

## Errors

<ParamField body="RemoteOpError">
  The server reports that an async operation failed. The `error_class` attribute carries the server-side exception class name (for example, `"ValueError"` or `"DispatcherError"`), which is useful for routing in caller code.
</ParamField>

<ParamField body="UnknownRequestError">
  The server returned 404 for an operation ID. This can mean the server has no record of the operation (after a pod restart, for example) or that the result was TTL-evicted. Resubmit the operation if the work is still needed; the server's idempotency-key deduplication prevents double-execution.
</ParamField>

<ParamField body="ServerShutdownError">
  The server is shutting down (503 response). Retry the request against a different replica.
</ParamField>

## Tinker compatibility shim

Install with the `[tinker]` extra (`uv add 'baseten-loops[tinker]'`) and import `tinker`. The shim wheel maps Tinker's client interface onto the Loops SDK, so existing training scripts that import from `tinker` run without modification. The underlying classes (`ServiceClient`, `TrainingClient`, `SamplingClient`) and every method on them are the same; only the import path changes. For the full list of mapped names and any behavioral differences, see the [Tinker compatibility guide](/loops/tinker-compatibility).
