Skip to main content
The Loops Python SDK exposes three classes used in training scripts. ServiceClient provisions trainer and sampling servers on the Baseten control plane and manages the session that ties them together. TrainingClient runs forward passes, backward passes, and optimizer steps against a live trainer server. SamplingClient generates completions from the current weights, and the version pinning it carries means you always sample from exactly the checkpoint you trained. These three classes mirror the Tinker shapes; the methods you call are the same names.

Installation

The main client package is baseten-loops. The Tinker compatibility shim ships as the [tinker] extra, which pulls in the baseten-loops-tinker wheel and provides the tinker namespace. From a uv project (see the quickstart for project setup), install both with the extra:
uv add 'baseten-loops[tinker]'
from baseten.loops import ServiceClient, TrainingClient, SamplingClient

# Tinker-compatible namespace (provided by baseten-loops[tinker])
import tinker

Minimal example

One LoRA round trip: provision a trainer, run a forward and backward pass over a single masked prompt-and-answer pair, step the optimizer, publish the weights, and sample from them.
from baseten.loops import (
    ServiceClient, Datum, ModelInput, TensorData, AdamParams, SamplingParams,
)

service_client = ServiceClient()
training_client = service_client.create_lora_training_client(
    base_model="Qwen/Qwen3.5-2B", rank=16,
)

# Tokenize one prompt/answer pair and mask the prompt positions from the loss.
tokenizer = training_client.get_tokenizer()
prompt = tokenizer.encode("What is the capital of France?\nAnswer:", add_special_tokens=False)
answer = tokenizer.encode(" Paris", add_special_tokens=False)
tokens = prompt + answer
targets = [-100] * len(prompt) + answer

datum = Datum(
    model_input=ModelInput.from_ints(tokens),
    loss_fn_inputs={"target_tokens": TensorData(data=targets, dtype="int64", shape=[len(targets)])},
)

training_client.forward_backward(data=[datum]).result(timeout=600.0)
training_client.optim_step(AdamParams(learning_rate=4e-5)).result(timeout=600.0)
sampling_client = training_client.save_weights_and_get_sampling_client(name="step-1").result(timeout=600.0)

result = sampling_client.sample(
    prompt=ModelInput.from_ints(prompt), num_samples=1, sampling_params=SamplingParams(max_tokens=16),
)
print(result.sequences[0].tokens)
Each long-running call is submit-then-.result(): the submit validates and returns immediately, and .result() long-polls until the operation finishes. Provisioning the trainer can take several minutes on a fresh base model, so the first call blocks the longest. The sampling_client returned by save_weights_and_get_sampling_client is pinned to the version you just trained, so the sample reflects this step’s weights. For the end-to-end walkthrough with expected output, see the Loops quickstart.

ServiceClient: provision

ServiceClient is the entry point for every session. It calls the Baseten control plane to create a TrainerSession, then provisions trainer and sampling servers within that session on demand.
ServiceClient(training_project_id=None, *, api_key=None, base_url=None, reuse_from_session_id=None)
ServiceClient
Construct a ServiceClient and create a new TrainerSession on the Baseten control plane. Omit training_project_id to use the default project for the org, or pass one to target a specific training project. api_key defaults to the BASETEN_API_KEY environment variable.Pass reuse_from_session_id to reuse a prior session’s trainer and sampler for create_lora_training_client and create_sampling_client calls instead of provisioning fresh. The named session must belong to the same team. ServiceClient reads the LOOPS_REUSE_FROM_SESSION_ID environment variable when no kwarg is passed; the kwarg wins when both are set. Reuse is best-effort: if the prior deployment is stopped, failed, or unhealthy, a fresh one is provisioned and the call still succeeds. See Reusing infrastructure across sessions.
ServiceClient.local(*, trainer_url, sampler_url)
ServiceClient
Bind to already-running local trainer and sampler processes without contacting the control plane. Pass trainer_url and sampler_url as the base URLs of local server processes. Useful for end-to-end testing.
create_lora_training_client(base_model, rank=32, seed=None, timeout=600.0, ready_timeout=3600.0, wandb_config=None)
TrainingClient
Provision a TrainerServer for the given Hugging Face base_model and return a connected TrainingClient. The control plane also provisions a paired sampling server in the same call; save_weights_and_get_sampling_client uses that paired URL to gate on version readiness. Pass a WandbConfig instance to stream training metrics to a Weights & Biases run.
create_sampling_client(base_model, timeout=300.0, ready_timeout=3600.0, model_path=None)
SamplingClient
Provision a standalone SamplingServer for base_model and return a connected SamplingClient. Use this when you want to sample from a base model independently of a training run. The model_path argument is reserved and not yet implemented; passing it raises NotImplementedError. To sample from a specific checkpoint, use TrainingClient.create_sampling_client(model_path=...) on a live run instead.
get_server_capabilities()
ServerCapabilities
Return the control plane’s view of supported base models and the GPU classes it can provision them on. Useful for confirming a base model is available before calling create_lora_training_client.
list_checkpoints(run_id)
list[Checkpoint]
List checkpoints saved by the run identified by run_id. Calls the Baseten API, not the trainer server directly.
get_checkpoint_archive_url(checkpoint_id, page_size=1000, page_token=0)
CheckpointFilesResponse
Return presigned URLs for every file in the specified checkpoint folder. Checkpoint IDs are globally unique, so no run scoping is required. The Loops stack writes checkpoints as unzipped directories rather than archives, so this returns a file list instead of a single archive URL.
session_id
str
Property. The session ID assigned by the control plane. Available after construction.

TrainingClient: train

TrainingClient talks directly to a dp_worker instance. Long-running operations use a submit-and-retrieve protocol: the submit fires immediately on the calling thread (so validation errors surface at call time) and .result() long-polls the server until the operation finishes. You can submit multiple operations before awaiting any of them.
Every long-running server operation on ServiceClient, TrainingClient, and SamplingClient (for example, forward_backward, sample, create_lora_training_client) has an await-able *_async counterpart for callers running their own event loop. The async variants accept the same arguments as their synchronous names. Simpler blocking calls like health, ensure_ready, get_tokenizer, and close (whose async form is aclose) have no *_async twin.
forward_backward(data, loss_fn="cross_entropy", loss_fn_config=None)
ForwardBackwardFuture
Run a forward and backward pass over data (a list of Datum objects) using the specified loss function. Returns a ForwardBackwardFuture; call .result() to block until the pass completes and retrieve the loss.
forward(data, loss_fn="cross_entropy", loss_fn_config=None)
ForwardBackwardFuture
Run a forward pass without gradient computation. Same inputs and output shape as forward_backward, but the gradient buffer is left untouched, so it is safe to interleave with gradient accumulation steps.
optim_step(adam_params)
OperationFuture[OptimStepResponse]
Apply the accumulated gradients using the Adam optimizer configured by adam_params. Call this after one or more forward_backward calls.
save_state(name, ttl_seconds=None)
OperationFuture[SaveWeightsResponse]
Persist a local training checkpoint under name. When a weight sync URI is configured server-side, save_state also publishes the LoRA adapter so a polling sampler can hot-swap to the new weights.
save_weights_for_sampler(name, ttl_seconds=None)
OperationFuture[SaveWeightsResponse]
Publish the LoRA adapter to the paired sampling server under name without returning a snapshot-pinned SamplingClient. Use this when you don’t need the version gate that save_weights_and_get_sampling_client provides.
save_weights_and_get_sampling_client(name)
_ComposedFuture[SamplingClient]
Publish the LoRA adapter to the paired sampling server under name and return a future that resolves to a SamplingClient pinned to the newly published version. Calling .result() runs two stages: the trainer publishes weights, then the SDK polls the sampler until at least one replica reports the new version loaded. The sampler-wait phase has a fixed 600-second ceiling independent of the timeout= you pass to .result(); if no replica reports the new version by then, the call raises RuntimeError. The returned SamplingClient carries X-Min-Policy-Version on every subsequent sample() call, so requests only land on replicas that have the right weights.
load_state(path)
OperationFuture[LoadWeightsResponse]
Load weights from a bt://loops:<run_id>/weights/<checkpoint> URI into this trainer. Use to resume training from a checkpoint.
load_state_with_optimizer(path)
OperationFuture[LoadWeightsResponse]
Same as load_state but also restores Adam moments. Use when you want bit-exact resumption.
list_checkpoints()
list[Checkpoint]
List checkpoints for the run bound to this client. Requires that this client was constructed using ServiceClient.create_lora_training_client (which populates the necessary session and run IDs automatically).
get_checkpoint_archive_url(checkpoint_id, page_size=1000, page_token=0)
CheckpointFilesResponse
Return presigned URLs for every file in a checkpoint folder. Same semantics as ServiceClient.get_checkpoint_archive_url.
create_sampling_client(model_path)
SamplingClient
Return a SamplingClient bound to the paired sampler, loading the weights at model_path (a bt://loops:<run_id>/sampler_weights/<checkpoint> URI). Distinct from ServiceClient.create_sampling_client, which provisions a fresh sampler.
get_tokenizer()
PreTrainedTokenizer
Return the Hugging Face PreTrainedTokenizer for the base model. Cached after the first load.
get_info()
GetInfoResponse
Return the model configuration for this training session (base model name, LoRA rank, and max sequence length) without a server round-trip.
run_id
str | None
Property. The run ID this client is bound to. Use this when filtering checkpoints or making HTTP API calls against the same run.
policy_version
int
Property. The current policy version the trainer has published. Incremented on each save_weights_and_get_sampling_client (or save_weights_for_sampler) call.
init_trainer_server(lora_rank)
OperationFuture[InitTrainerServerResponse]
Reset trainer state to a fresh LoRA adapter at lora_rank. Use to start a new adapter on an existing trainer without reprovisioning.
health()
None
Check the trainer’s /health endpoint. Returns None on success and raises if the trainer is unreachable or unhealthy.
close()
None
Close the client’s HTTP connections and finish any active Weights & Biases run. Pure-async callers can use aclose() instead, which closes connections directly on the running event loop.

SamplingClient: sample

SamplingClient generates text completions from the model the sampler currently has loaded. There are two creation paths with different version semantics: ServiceClient.create_sampling_client returns an auto-updating client that follows whatever weights the sampler currently holds, while TrainingClient.save_weights_and_get_sampling_client returns a snapshot client pinned to the trained version. Both clients expose the same sample method.
sample(prompt, num_samples=1, sampling_params=None, include_prompt_logprobs=False, topk_prompt_logprobs=0)
SampleResult
Generate num_samples completions from prompt (a ModelInput). Pass a SamplingParams instance to control temperature, top-p, top-k, max tokens, seed, and stop sequences; omit it to use defaults. Set include_prompt_logprobs=True to get per-token log-probabilities for the input tokens alongside the output, and set topk_prompt_logprobs above 0 to also return the top-k alternatives at each prompt position. The sampler resolves which adapter or base model to serve from the version headers the client carries, so there is no per-call model override.
compute_logprobs(prompt)
list[float | None]
Return the per-token log-probabilities for prompt without generating any new tokens. Index 0 is always None because the first token has no preceding context to score against. Other positions may also be None if the sampler can’t compute a log-probability for that token.
discover_base_model_name()
str
Return the base model ID from the sampler’s /v1/models list, specifically the entry with no parent. Retries with backoff while the sampler is still deploying.
discover_adapter_name()
str | None
Return the currently registered LoRA adapter ID (the first /v1/models entry with a non-null parent), or None if no adapter is loaded.
get_base_model()
str
Return the base model ID this sampling client was created with, without contacting the server.
get_tokenizer()
PreTrainedTokenizer
Return the Hugging Face PreTrainedTokenizer for the base model this client was created with.
ensure_ready(ready_timeout=None)
None
Block until the sampler’s deployment status is ACTIVE. A scaled-to-zero deployment triggers one wake; terminal-failure states raise. No-op for local deployments.
ensure_ready_for_deployment(*, base_model, deployment, api_key=None, ready_timeout=3600.0)
None
Class method. Block until deployment reports ready, using a throwaway SamplingClient so you can wait without holding one. Polls up to ready_timeout seconds and applies the same readiness semantics as ensure_ready.

Types

Training inputs

Datum
A single training example: a ModelInput paired with a dict of TensorData loss function inputs.
ModelInput
A tokenized prompt, represented as a list of ModelInputChunk objects. Construct with ModelInput.from_ints(token_ids) for the common case.
ModelInputChunk
A discriminated union of EncodedTextChunk (a list of token IDs) and ImageChunk (a base64-encoded image with an expected token count).
TensorData
A serializable tensor with a flat data list, a dtype string, and a shape. Convert to and from torch.Tensor with TensorData.to_torch() and TensorData.from_torch(tensor).

Configuration

SamplingParams
Controls for text generation: temperature, top_p, top_k, max_tokens, seed, and stop.
AdamParams
Optimizer hyperparameters: learning_rate, beta1, beta2, eps, weight_decay, and grad_clip_norm.
WandbConfig
Optional Weights & Biases settings (project and an optional run name) passed to create_lora_training_client to stream training metrics.

Results and handles

SampleResult
The full response from sample(): a list of SampledSequence objects in sequences, the policy_version the sampler replica was running, and prompt_logprobs / topk_prompt_logprobs populated when the matching sample() flags are set.
SampledSequence
A single generated sequence: a list of output token IDs, optional per-token log-probabilities, and a stop reason.
Checkpoint
Metadata for a saved checkpoint, populated by list_checkpoints().
CheckpointFilesResponse
A paginated list of presigned file URLs for a checkpoint, populated by get_checkpoint_archive_url().
CheckpointFile
One entry in a CheckpointFilesResponse.presigned_urls list: a presigned URL plus relative_file_name, node_rank, size_bytes, and last_modified metadata.
ServerCapabilities, SupportedModel
Returned by ServiceClient.get_server_capabilities(); describe which base models the control plane can provision and on which GPU classes.
OperationFuture[T]
A handle to a long-running training operation. Call .result() or .result(timeout=seconds) to block until the operation completes and return the result. The forward and forward_backward methods return a ForwardBackwardFuture subclass, and save_weights_and_get_sampling_client returns a composed future; both expose the same .result() contract.
ForwardBackwardOutput, OptimStepResponse, SaveWeightsResponse, SaveWeightsForSamplerResponse, LoadWeightsResponse, InitTrainerServerResponse, SampleResponse
Response payloads returned by the matching TrainingClient and SamplingClient methods.

Errors

RemoteOpError
The server reports that an async operation failed. The error_class attribute carries the server-side exception class name (for example, "ValueError" or "DispatcherError"), which is useful for routing in caller code.
UnknownRequestError
The server returned 404 for an operation ID. This can mean the server has no record of the operation (after a pod restart, for example) or that the result was TTL-evicted. Resubmit the operation if the work is still needed; the server’s idempotency-key deduplication prevents double-execution.
ServerShutdownError
The server is shutting down (503 response). Retry the request against a different replica.

Tinker compatibility shim

Install with the [tinker] extra (uv add 'baseten-loops[tinker]') and import tinker. The shim wheel maps Tinker’s client interface onto the Loops SDK, so existing training scripts that import from tinker run without modification. The underlying classes (ServiceClient, TrainingClient, SamplingClient) and every method on them are the same; only the import path changes. For the full list of mapped names and any behavioral differences, see the Tinker compatibility guide.