Sessions
A Loops session is the container resource that scopes a training project’s work. It holds the trainer server and sampling server for a given base model and links them to a Baseten training project. Everything you create within a session (trainer servers, sampling servers, checkpoints) is queryable through that session’s ID. For the full route reference, see the Loops API overview.Trainer servers
A trainer server is the process that runs the training computation: forward pass, backward pass, and optimizer step. It owns the model weights for the duration of the session and writes checkpoints to a dedicated storage path under abt://loops:… URI. There is one trainer per session per base model. The trainer defaults to the longest sequence length the base model supports; Baseten picks the GPU type, GPU count, and node topology (single-node or multi-node) that supports it. When you call POST /v1/loops/runs, Baseten provisions the trainer server alongside its paired sampling server and returns both resource IDs. The API route calls a trainer server a “run”. Both the HTTP API and the SDK identify it by its run ID: the API takes a run_id query parameter, and the SDK exposes the same value as TrainingClient.run_id.
Sampling servers
A sampling server runs inference from the trainer’s current weights. It’s provisioned alongside the trainer and linked to it at creation time. The sampler receives new weights through the weight-sync runtime whenever the trainer saves them. See How weight sync works for the mechanism. Because the sampler doesn’t restart during a session, generation latency stays low even as weights change, and you can interleave training steps and rollout calls without coordinating reloads.Checkpoints
Every time the trainer saves weights, Loops creates a checkpoint identified by abt://loops:<run_id>/(weights|sampler_weights)/<checkpoint_name> URI. The URI encodes the run ID, the checkpoint target (trainer weights or sampler weights), and the checkpoint name, for example, bt://loops:k4q95w5/weights/step-100. You pass this URI to create a trainer or sampler server from a prior checkpoint, or to deploy weights to inference.
Checkpoints are stored as folders on disk, not as single archives. Listing checkpoint files returns a paginated response of presigned URLs, one URL per file in the folder, controlled by page_size and page_token query parameters. This differs from Tinker’s single-archive download shape: Tinker returns one URL you download and unpack; Loops returns a page of per-file URLs you fetch individually. If your client code unpacks a Tinker archive today, you’ll need to adapt it to iterate the paginated file list instead. The route is GET /v1/loops/checkpoints/{checkpoint_id}/files.
Deployments
A Loops deployment is the trainer and sampler you create at the start of a session. They stay live as you train, and weights you commit stream into the sampler in place, with no separate deploy step for inference. Start a deployment withtruss loops push <base_model>. Shut it down with truss loops deactivate <deployment_id>, using the deployment ID from truss loops view.
Reusing infrastructure across sessions
By default, every newServiceClient creates a fresh session, which provisions a new trainer and sampler. Each re-run of a script pays the full cold-start cost.
A session can opt in to reusing a prior session’s trainer and sampler instead of provisioning new ones. Three equivalent surfaces:
- SDK kwarg:
tinker.ServiceClient(reuse_from_session_id="sess_…"). - Environment variable:
LOOPS_REUSE_FROM_SESSION_ID=sess_….ServiceClientreads this when no kwarg is passed. - HTTP request:
reuse_from_session_idfield onPOST /v1/loops/runsandPOST /v1/loops/samplers.