Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.baseten.co/llms.txt

Use this file to discover all available pages before exploring further.

The Loops Python SDK exposes three classes used in training scripts. ServiceClient provisions trainer and sampling servers on the Baseten control plane and manages the session that ties them together. TrainingClient runs forward passes, backward passes, and optimizer steps against a live trainer server. SamplingClient generates completions from the current weights, and the version pinning it carries means you always sample from exactly the checkpoint you trained. These three classes mirror the Tinker shapes; the methods you call are the same names.

Installation

The main client package is baseten-loops. The Tinker compatibility shim ships as the [tinker] extra, which pulls in the baseten-loops-tinker wheel and provides the tinker namespace. From a uv project (see the quickstart for project setup), install both with the extra:
uv add 'baseten-loops[tinker]'
from baseten.loops import ServiceClient, TrainingClient, SamplingClient

# Tinker-compatible namespace (provided by baseten-loops[tinker])
import tinker

ServiceClient: provision

ServiceClient is the entry point for every session. It calls the Baseten control plane to create a TrainerSession, then provisions trainer and sampling servers within that session on demand.

ServiceClient(training_project_id=None, *, api_key=None, base_url=None, reuse_from_session_id=None)

Construct a ServiceClient and create a new TrainerSession on the Baseten control plane. Omit training_project_id to use the default project for the org, or pass one to target a specific training project. api_key defaults to the BASETEN_API_KEY environment variable. Pass reuse_from_session_id to reuse a prior session’s trainer and sampler for create_lora_training_client and create_sampling_client calls instead of provisioning fresh. The named session must belong to the same team. ServiceClient reads the LOOPS_REUSE_FROM_SESSION_ID environment variable when no kwarg is passed; the kwarg wins when both are set. Reuse is best-effort: if the prior deployment is stopped, failed, or unhealthy, a fresh one is provisioned and the call still succeeds. See Reusing infrastructure across sessions.

ServiceClient.local(*, trainer_url, sampler_url) -> ServiceClient

Bind to already-running local trainer and sampler processes without contacting the control plane. Pass trainer_url and sampler_url as the base URLs of local server processes. Useful for end-to-end testing.

service_client.create_lora_training_client(base_model, rank=32, seed=None, timeout=600.0, ready_timeout=3600.0, wandb_config=None) -> TrainingClient

Provision a TrainerServer for the given Hugging Face base_model and return a connected TrainingClient. The control plane also provisions a paired sampling server in the same call; save_weights_and_get_sampling_client uses that paired URL to gate on version readiness. Pass a WandbConfig instance to stream training metrics to a Weights & Biases run.

service_client.create_sampling_client(base_model=None, timeout=300.0, ready_timeout=3600.0, *, model_path=None) -> SamplingClient

Provision a standalone SamplingServer and return a connected SamplingClient. Use this when you want to sample from a model independently of a training run. Pass base_model to load the base weights only, or model_path (a bt://loops:<run_id>/sampler_weights/<checkpoint> URI) to load a specific checkpoint at startup.

service_client.get_server_capabilities() -> ServerCapabilities

Return the control plane’s view of supported base models and the GPU classes it can provision them on. Useful for confirming a base model is available before calling create_lora_training_client.

service_client.list_checkpoints(run_id) -> list[Checkpoint]

List checkpoints saved by the run identified by run_id. Calls the Baseten API, not the trainer server directly.

service_client.get_checkpoint_archive_url(checkpoint_id, page_size=1000, page_token=0) -> CheckpointFilesResponse

Return presigned URLs for every file in the specified checkpoint folder. Checkpoint IDs are globally unique, so no run scoping is required. The Loops stack writes checkpoints as unzipped directories rather than archives, so this returns a file list instead of a single archive URL.

service_client.session_id -> str

The session ID assigned by the control plane. Available after construction.

TrainingClient: train

TrainingClient talks directly to a dp_worker instance. Long-running operations use a submit-and-retrieve protocol: the submit fires immediately on the calling thread (so validation errors surface at call time) and .result() long-polls the server until the operation finishes. You can submit multiple operations before awaiting any of them.

TrainingClient.forward_backward(data, loss_fn="cross_entropy", loss_fn_config=None) -> ForwardBackwardFuture

Run a forward and backward pass over data (a list of Datum objects) using the specified loss function. Returns a ForwardBackwardFuture; call .result() to block until the pass completes and retrieve the loss.

TrainingClient.forward(data, loss_fn="cross_entropy", loss_fn_config=None) -> ForwardBackwardFuture

Run a forward pass without gradient computation. Same inputs and output shape as forward_backward, but the gradient buffer is left untouched, so it is safe to interleave with gradient accumulation steps.

TrainingClient.optim_step(adam_params) -> OperationFuture[OptimStepResponse]

Apply the accumulated gradients using the Adam optimizer configured by adam_params. Call this after one or more forward_backward calls.

TrainingClient.save_state(name, ttl_seconds=None) -> OperationFuture[SaveWeightsResponse]

Persist a local training checkpoint under name. When a weight sync URI is configured server-side, save_state also publishes the LoRA adapter so a polling sampler can hot-swap to the new weights.

TrainingClient.save_weights_for_sampler(name, ttl_seconds=None) -> OperationFuture[SaveWeightsResponse]

Publish the LoRA adapter to the paired sampling server under name without returning a snapshot-pinned SamplingClient. Use this when you don’t need the version gate that save_weights_and_get_sampling_client provides.

TrainingClient.save_weights_and_get_sampling_client(name) -> _ComposedFuture[SamplingClient]

Publish the LoRA adapter to the paired sampling server under name and return a future that resolves to a SamplingClient pinned to the newly published version. Calling .result() runs two stages: the trainer publishes weights, then the SDK polls the sampler until at least one replica reports the new version loaded. The sampler-wait phase has a fixed 600-second ceiling independent of the timeout= you pass to .result(); if no replica reports the new version by then, the call raises RuntimeError. The returned SamplingClient carries X-Min-Policy-Version on every subsequent sample() call, so requests only land on replicas that have the right weights.

TrainingClient.load_state(path) -> OperationFuture[LoadWeightsResponse]

Load weights from a bt://loops:<run_id>/weights/<checkpoint> URI into this trainer. Use to resume training from a checkpoint.

TrainingClient.load_state_with_optimizer(path) -> OperationFuture[LoadWeightsResponse]

Same as load_state but also restores Adam moments. Use when you want bit-exact resumption.

TrainingClient.list_checkpoints() -> list[Checkpoint]

List checkpoints for the run bound to this client. Requires that this client was constructed via ServiceClient.create_lora_training_client (which populates the necessary session and run IDs automatically).

TrainingClient.get_checkpoint_archive_url(checkpoint_id, page_size=1000, page_token=0) -> CheckpointFilesResponse

Return presigned URLs for every file in a checkpoint folder. Same semantics as ServiceClient.get_checkpoint_archive_url.

TrainingClient.create_sampling_client(model_path) -> SamplingClient

Return a SamplingClient bound to the paired sampler, loading the weights at model_path (a bt://loops:<run_id>/sampler_weights/<checkpoint> URI). Distinct from ServiceClient.create_sampling_client, which provisions a fresh sampler.

TrainingClient.get_tokenizer()

Return the Hugging Face PreTrainedTokenizer for the base model. Cached after the first load.

TrainingClient.get_info() -> GetInfoResponse

Return the model configuration for this training session (base model name, LoRA rank, and max sequence length) without a server round-trip.

TrainingClient.run_id -> str

The run ID this client is bound to. Use this when filtering checkpoints or making HTTP API calls against the same run.

TrainingClient.policy_version -> int

The current policy version the trainer has published. Incremented on each save_weights_and_get_sampling_client (or save_weights_for_sampler) call.

SamplingClient: sample

SamplingClient generates text completions from the model the sampler currently has loaded. There are two creation paths with different version semantics: ServiceClient.create_sampling_client returns an auto-updating client that follows whatever weights the sampler currently holds, while TrainingClient.save_weights_and_get_sampling_client returns a snapshot client pinned to the trained version. Both clients expose the same sample method.

SamplingClient.sample(prompt, num_samples, sampling_params, include_prompt_logprobs, topk_prompt_logprobs, *, model) -> SampleResult

Generate num_samples completions from prompt (a ModelInput). Pass a SamplingParams instance to control temperature, top-p, top-k, max tokens, and stop sequences. Set include_prompt_logprobs=True to get per-token log-probabilities for the input tokens alongside the output. Pass model (keyword-only) to override the sampler’s auto-detected adapter or base model for a single call. This is useful for explicitly targeting the base via discover_base_model_name() or a specific adapter.

SamplingClient.compute_logprobs(prompt) -> list[float | None]

Return the per-token log-probabilities for prompt without generating any new tokens. Index 0 is always None because the first token has no preceding context to score against. Other positions may also be None if the sampler can’t compute a log-probability for that token.

SamplingClient.discover_base_model_name() -> str

Return the base model ID from the sampler’s /v1/models list, specifically the entry with no parent. Retries with backoff while the sampler is still deploying.

SamplingClient.discover_adapter_name() -> str | None

Return the currently registered LoRA adapter ID, or None if no adapter is loaded. sample() calls this internally when model_name is not set, and caches the result until the adapter changes.

SamplingClient.get_tokenizer()

Return the Hugging Face PreTrainedTokenizer for the base model this client was created with. Raises ValueError if base_model was not set at construction time.

Types

Datum: A single training example: a ModelInput paired with a dict of TensorData loss function inputs. ModelInput: A tokenized prompt, represented as a list of ModelInputChunk objects. Construct with ModelInput.from_ints(token_ids) for the common case. ModelInputChunk: A discriminated union of EncodedTextChunk (a list of token IDs) and ImageChunk (a base64-encoded image with an expected token count). TensorData: A serializable tensor with a flat data list, a dtype string, and a shape. Convert to and from torch.Tensor with TensorData.to_torch() and TensorData.from_torch(tensor). SamplingParams: Controls for text generation: temperature, top_p, top_k, max_tokens, seed, and stop. AdamParams: Optimizer hyperparameters: learning_rate, beta1, beta2, eps, weight_decay, and grad_clip_norm. SampledSequence: A single generated sequence: a list of output token IDs, optional per-token log-probabilities, and a stop reason. SampleResult: The full response from sample(): a list of SampledSequence objects, an optional PromptLogprobs, and the policy_version the sampler replica was running. Checkpoint: Metadata for a saved checkpoint, populated by list_checkpoints(). CheckpointFilesResponse: A paginated list of presigned file URLs for a checkpoint, populated by get_checkpoint_archive_url(). CheckpointFile: One entry in a CheckpointFilesResponse.presigned_urls list: a presigned URL plus relative_file_name, node_rank, size_bytes, and last_modified metadata. WandbConfig: Optional Weights & Biases settings (project, entity, run_name, tags) passed to create_lora_training_client to stream training metrics. ForwardBackwardOutput, OptimStepResponse, SaveWeightsResponse, SaveWeightsForSamplerResponse, LoadWeightsResponse, InitTrainerServerResponse, SampleResponse: Response payloads returned by the matching TrainingClient and SamplingClient methods. ServerCapabilities, SupportedModel: Returned by ServiceClient.get_server_capabilities(); describe which base models the control plane can provision and on which GPU classes. OperationFuture[T]: A handle to a long-running training operation. Call .result() or .result(timeout=seconds) to block until the operation completes and return the result. The forward/forward_backward methods return a ForwardBackwardFuture subclass and save_weights_and_get_sampling_client returns a composed future; both expose the same .result() contract.

Errors

RemoteOpError: The server reports that an async operation failed. The error_class attribute carries the server-side exception class name (for example, "ValueError" or "DispatcherError"), which is useful for routing in caller code. UnknownRequestError: The server returned 404 for an operation ID. This can mean the server has no record of the operation (after a pod restart, for example) or that the result was TTL-evicted. Resubmit the operation if the work is still needed; the server’s idempotency-key deduplication prevents double-execution. ServerShutdownError: The server is shutting down (503 response). Retry the request against a different replica.

Tinker compatibility shim

Install with the [tinker] extra (uv add 'baseten-loops[tinker]') and import tinker. The shim wheel maps Tinker’s client interface onto the Loops SDK, so existing training scripts that import from tinker run without modification. The underlying classes (ServiceClient, TrainingClient, SamplingClient) and every method on them are the same; only the import path changes. For the full list of mapped names and any behavioral differences, see the Tinker compatibility guide.