Deploy a checkpoint

A sampler checkpoint deploys to a dedicated Baseten inference deployment that serves the base model with your LoRA adapter loaded. Deployment works whether or not the training session is still running: checkpoints outlive the session, so you can shut the trainer and sampler down first and deploy later. Deploying needs an hf_access_token in workspace secrets, because the deployment downloads the base weights from Hugging Face.

Deploy from the CLI

Run truss loops checkpoints deploy with the checkpoint’s globally unique id (the id field from listing checkpoints, not the checkpoint name):

uvx truss loops checkpoints deploy --checkpoint-ids <id>

The --checkpoint-ids flag skips the checkpoint picker, but the command still prompts for a model name, GPU type, GPU count, and the Hugging Face secret name (default hf_access_token). A successful deploy prints the IDs you need to call the model:

Successfully created deployment: deployment-1
Model ID: wnpkzp03
Deployment ID: qrpg2r0
Deployment succeeded.
Set the model parameter on each request to the Loops checkpoint name (e.g. step-100).

The deployment downloads base weights and starts the inference server, so several minutes pass before it reaches ACTIVE. To inspect the generated Truss config without deploying, pass --dry-run.

Deploy from the dashboard

Every Loop has a page in the dashboard at app.baseten.co/training/loop/<run_id>, reachable from the Training tab. Its Checkpoints table lists each saved checkpoint with its bt:// path, type, and size, and is where you deploy one; the flow matches deploying job checkpoints.

Call the deployed model

Call the deployment’s OpenAI-compatible chat completions route. The URL includes the deployment ID, and model is the checkpoint name (checkpoint_id), not its globally unique id:

curl -X POST "https://model-<model_id>.api.baseten.co/deployment/<deployment_id>/sync/v1/chat/completions" \
  -H "Authorization: Bearer $BASETEN_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "step-1",
    "messages": [{"role": "user", "content": "What is the capital of France?"}]
  }'

{
  "model": "step-1",
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "The capital of France is **Paris**.\n\nLocated in northeastern France…"
      },
      "finish_reason": "length"
    }
  ]
}

For request options, streaming, and client libraries, see calling your model.

Delete the deployment

The deployment bills for its GPU while it’s live. Delete it with DELETE /v1/models/{model_id} when you’re done:

curl -X DELETE "https://api.baseten.co/v1/models/<model_id>" \
  -H "Authorization: Bearer $BASETEN_API_KEY"

{"id": "wnpkzp03", "deleted": true}

Deleting the deployment doesn’t touch the checkpoint. You can redeploy it any time.

Next steps

Train on a dataset: Move from the quickstart’s single example to a real training loop.
Loops concepts: How checkpoints relate to sessions, trainers, and samplers.

Overview

Get started

Model APIs

Inference

Development

Deployment

Engines

Frontier Gateway

Training

Organization

Observability

Troubleshooting

Deploy a checkpoint

Deploy from the CLI

Deploy from the dashboard

Call the deployed model

Delete the deployment

Next steps

​Deploy from the CLI

​Deploy from the dashboard

​Call the deployed model

​Delete the deployment

​Next steps

Deploy from the CLI

Deploy from the dashboard

Call the deployed model

Delete the deployment

Next steps