TheDocumentation Index
Fetch the complete documentation index at: https://docs.baseten.co/llms.txt
Use this file to discover all available pages before exploring further.
truss train command provides subcommands for managing the full training job lifecycle.
init
Initialize a training project from templates or create an empty project.Options
}--list-examples
List all available examples.
Customizes logging.
--non-interactive
Disables interactive prompts, use in CI / automated execution contexts.
Examples
Initialize a project from a template:push
Submit and run a training job.Arguments
Path to the training configuration file (for example,
config.py).Options
}Remote to use.
--tail
Tail for status + logs after push.
Name of the training job.
Team name for the training project
The
--team flag is only available if your organization has teams enabled. Contact us to enable teams, or see Teams for more information.Interactive session trigger mode
Interactive session timeout in minutes
Accelerator type and count (e.g., H200:8)
Number of compute nodes
Entrypoint command.
Job priority (higher values run first when capacity frees up).
Customizes logging.
--non-interactive
Disables interactive prompts, use in CI / automated execution contexts.
Examples
Submit a training job:logs
Fetch and stream logs from a training job.Options
}Remote to use.
Project ID.
Project name or project id.
Job ID.
--tail
Tail for ongoing logs.
Customizes logging.
--non-interactive
Disables interactive prompts, use in CI / automated execution contexts.
Examples
Stream logs for a specific job:metrics
View real-time metrics for a training job including CPU, GPU, and storage usage.Options
}Project ID.
Project name or project id.
Job ID.
Remote to use.
Customizes logging.
--non-interactive
Disables interactive prompts, use in CI / automated execution contexts.
Examples
View metrics for a specific job:view
List training projects and jobs, or view details for a specific job. This command lists jobs in theTRAINING_JOB_PENDING state (waiting for GPU capacity) alongside other active jobs.
Options
}View training jobs for a project.
Project name or project id.
View a specific training job.
Remote to use.
Customizes logging.
--non-interactive
Disables interactive prompts, use in CI / automated execution contexts.
Examples
List all training projects:stop
Stop a running or pending training job.Options
}Project ID.
Project name or project id.
Job ID.
--all
Stop all running jobs.
Remote to use.
Customizes logging.
--non-interactive
Disables interactive prompts, use in CI / automated execution contexts.
Examples
Stop a specific job:recreate
Recreate an existing training job with the same configuration.Options
}Job ID of Training Job to recreate
Remote to use.
--tail
Tail for status + logs after recreation.
Customizes logging.
--non-interactive
Disables interactive prompts, use in CI / automated execution contexts.
Examples
Recreate a specific job:download
Download training job artifacts to your local machine.Options
}Job ID.
Remote to use.
Directory where the file should be downloaded. Defaults to current directory.
--no-unzip
Instructs truss to not unzip the folder upon download.
Customizes logging.
--non-interactive
Disables interactive prompts, use in CI / automated execution contexts.
Examples
Download artifacts to current directory:deploy_checkpoints
Deploy a trained model checkpoint to Baseten’s inference platform.Options
}Project ID.
Project name or project id.
Job ID.
path to a python file that defines a DeployCheckpointsConfig
--dry-run
Generate a truss config without deploying
Path to output the truss config to. If not provided, will output to truss_configs/
model_version_namemodel_version_id or truss_configs/dry_runtimestamp if dry run.Remote to use.
Customizes logging.
--non-interactive
Disables interactive prompts, use in CI / automated execution contexts.
Examples
Deploy checkpoints interactively:Output
After a successful deployment, the command prints a clickable link to the deployment’s logs page.get_checkpoint_urls
Get presigned URLs for checkpoint artifacts.Options
}Job ID.
Remote to use.
Customizes logging.
--non-interactive
Disables interactive prompts, use in CI / automated execution contexts.
Examples
Get checkpoint URLs for a job:checkpoints list
List and interactively explore checkpoints for a training job.Options
}Remote to use.
Project ID.
Project name or project id.
Job ID.
Jump directly into a specific checkpoint’s files.
Sort checkpoints by checkpoint-id, size, created date, or type.
Sort order: ascending or descending.
Output format: cli-table (default), csv, or json.
Customizes logging.
--non-interactive
Disables interactive prompts, use in CI / automated execution contexts.
Interactive mode
When using the defaultcli-table format in an interactive terminal, the command launches a checkpoint explorer:
- Checkpoint picker: fuzzy-search and select a checkpoint from the list.
- File explorer: navigate the checkpoint’s directory tree. Press
→orEnterto open a directory or view a file. Press←to go back. PressCtrl-Cto quit.
.safetensors files, the explorer displays a tensor summary (layer names, dtypes, shapes, and parameter counts) instead of raw binary content. Text files display with syntax highlighting based on their file extension (for example, .json, .py, .yaml, .toml), falling back to plain text for unrecognized types.
Examples
List checkpoints for the most recent job:cache summarize
View a summary of the training cache for a project.Arguments
Project name or project ID.
Options
}Remote to use.
Sort files by filepath, size, modified date, file type, or permissions.
Sort order: ascending or descending.
Output format: cli-table (default), csv, or json.
Customizes logging.
--non-interactive
Disables interactive prompts, use in CI / automated execution contexts.
Examples
View cache summary:isession
View or update interactive session details for a training job, including auth codes and connection status.Options
}Job ID of the training job.
Remote to use.
Minutes to extend the session timeout by
Change the session trigger (cannot be changed on on_startup sessions)
Output format (default: table)
Customizes logging.
--non-interactive
Disables interactive prompts, use in CI / automated execution contexts.
Examples
View session details for a job:update_session
Update the interactive session configuration on a running training job. At least one of--trigger or --timeout-minutes must be provided.
Arguments
Job ID of the training job to update.
Options
}When to create the interactive session: ‘on_startup’ creates on job start, ‘on_failure’ creates on job failure, ‘on_demand’ allows manual session creation.
Number of minutes before the interactive session times out.
Remote to use.
Customizes logging.
--non-interactive
Disables interactive prompts, use in CI / automated execution contexts.
Examples
Change the session trigger:workstation
Spin up an SSH workstation on Baseten training infrastructure.Options
}GPU accelerator type (default: H100).
Number of GPUs (1-8, default: 1). Mutually exclusive with —node-count.
Project name (default: workstation-
accelerator).Number of nodes (each with 8 GPUs). Mutually exclusive with —gpu-count.
Multi-node orchestrator (default: slurm).
Custom Docker base image (default: nvidia/cuda:12.8.1-devel-ubuntu24.04).
--enable-checkpointing
Enable checkpoint storage.
Path inside the container to save checkpoints.
Checkpoint volume size in GiB.
Job ID to load the latest checkpoint from.
Remote to use.
--tail
Tail for status + logs after push.
Customizes logging.
--non-interactive
Disables interactive prompts, use in CI / automated execution contexts.
Examples
Launch a workstation with default settings:capacity
Show GPU capacity limits and current usage for the organization.Options
}Name of the remote to use
Examples
View capacity for the default remote:Ignore files and folders
Create a.truss_ignore file in your project root to exclude files from upload. Uses .gitignore syntax.
.truss_ignore