truss train command provides subcommands for managing the full training job lifecycle.
init
Initialize a training project from templates or create an empty project.Options
--list-examples
List all available examples.
Customizes logging.
--non-interactive
Disables interactive prompts, use in CI / automated execution contexts.
Examples
Initialize a project from a template:push
Submit and run a training job.Arguments
Path to the training configuration file (for example,
config.py).Options
Remote to use.
--tail
Tail for status + logs after push.
Name of the training job.
Team name for the training project
The
--team flag is only available if your organization has teams enabled. Contact us to enable teams, or see Teams for more information.Interactive session trigger mode
Interactive session timeout in minutes
Accelerator type and count (e.g., H200:8)
Number of compute nodes
Entrypoint command.
Job priority (higher values run first when capacity frees up).
Customizes logging.
--non-interactive
Disables interactive prompts, use in CI / automated execution contexts.
Examples
Submit a training job:logs
Fetch and stream logs from a training job.Options
Remote to use.
Project ID.
Project name or project id.
Job ID.
--tail
Tail for ongoing logs.
Customizes logging.
--non-interactive
Disables interactive prompts, use in CI / automated execution contexts.
Examples
Stream logs for a specific job:metrics
View real-time metrics for a training job including CPU, GPU, and storage usage.Options
Project ID.
Project name or project id.
Job ID.
Remote to use.
Customizes logging.
--non-interactive
Disables interactive prompts, use in CI / automated execution contexts.
Examples
View metrics for a specific job:view
List training projects and jobs, or view details for a specific job. This command lists jobs in theTRAINING_JOB_PENDING state (waiting for GPU capacity) alongside other active jobs.
Options
View training jobs for a project.
Project name or project id.
View a specific training job.
Remote to use.
Customizes logging.
--non-interactive
Disables interactive prompts, use in CI / automated execution contexts.
Examples
List all training projects:stop
Stop a running or pending training job.Options
Project ID.
Project name or project id.
Job ID.
--all
Stop all running jobs.
Remote to use.
Customizes logging.
--non-interactive
Disables interactive prompts, use in CI / automated execution contexts.
Examples
Stop a specific job:recreate
Recreate an existing training job with the same configuration.Options
Job ID of Training Job to recreate
Remote to use.
--tail
Tail for status + logs after recreation.
Customizes logging.
--non-interactive
Disables interactive prompts, use in CI / automated execution contexts.
Examples
Recreate a specific job:download
Download training job artifacts to your local machine.Options
Job ID.
Remote to use.
Directory where the file should be downloaded. Defaults to current directory.
--no-unzip
Instructs truss to not unzip the folder upon download.
Customizes logging.
--non-interactive
Disables interactive prompts, use in CI / automated execution contexts.
Examples
Download artifacts to current directory:deploy_checkpoints
Deploy a trained model checkpoint to Baseten’s inference platform.Options
Project ID.
Project name or project id.
Job ID.
path to a python file that defines a DeployCheckpointsConfig
--dry-run
Generate a truss config without deploying
Path to output the truss config to. If not provided, will output to truss_configs/
model_version_namemodel_version_id or truss_configs/dry_runtimestamp if dry run.Remote to use.
Customizes logging.
--non-interactive
Disables interactive prompts, use in CI / automated execution contexts.
Examples
Deploy checkpoints interactively:Output
After a successful deployment, the command prints a labeled block with the Model ID, Deployment ID, and a link to the deployment’s logs page.get_checkpoint_urls
Get presigned URLs for checkpoint artifacts.Options
Job ID.
Remote to use.
Customizes logging.
--non-interactive
Disables interactive prompts, use in CI / automated execution contexts.
Examples
Get checkpoint URLs for a job:checkpoints list
List and interactively explore checkpoints for a training job.Options
Remote to use.
Project ID.
Project name or project id.
Job ID.
Jump directly into a specific checkpoint’s files.
Sort checkpoints by checkpoint-id, size, created date, or type.
Sort order: ascending or descending.
Output format: cli-table (default), csv, or json.
Customizes logging.
--non-interactive
Disables interactive prompts, use in CI / automated execution contexts.
Interactive mode
When using the defaultcli-table format in an interactive terminal, the command launches a checkpoint explorer:
- Checkpoint picker: fuzzy-search and select a checkpoint from the list.
- File explorer: navigate the checkpoint’s directory tree. Press
→orEnterto open a directory or view a file. Press←to go back. PressCtrl-Cto quit.
.safetensors files, the explorer displays a tensor summary (layer names, dtypes, shapes, and parameter counts) instead of raw binary content. Text files display with syntax highlighting based on their file extension (for example, .json, .py, .yaml, .toml), falling back to plain text for unrecognized types.
Examples
List checkpoints for the most recent job:cache summarize
View a summary of the training cache for a project.Arguments
Project name or project ID.
Options
Remote to use.
Sort files by filepath, size, modified date, file type, or permissions.
Sort order: ascending or descending.
Output format: cli-table (default), csv, or json.
Customizes logging.
--non-interactive
Disables interactive prompts, use in CI / automated execution contexts.
Examples
View cache summary:isession
View or update interactive session details for a training job, including auth codes and connection status.Options
Job ID of the training job.
Remote to use.
Minutes to extend the session timeout by
Change the session trigger (cannot be changed on on_startup sessions)
Output format (default: table)
Customizes logging.
--non-interactive
Disables interactive prompts, use in CI / automated execution contexts.
Examples
View session details for a job:update_session
Update the interactive session configuration on a running training job. At least one of--trigger or --timeout-minutes must be provided.
Arguments
Job ID of the training job to update.
Options
When to create the interactive session: ‘on_startup’ creates on job start, ‘on_failure’ creates on job failure, ‘on_demand’ allows manual session creation.
Number of minutes before the interactive session times out.
Remote to use.
Customizes logging.
--non-interactive
Disables interactive prompts, use in CI / automated execution contexts.
Examples
Change the session trigger:workstation
Spin up an SSH workstation on Baseten training infrastructure.Options
GPU type for the workstation (default: H100).
Number of GPUs for a single-node workstation (1-8, default: 1). Mutually exclusive with
--node-count.Name of the training project that owns the workstation. Defaults to
workstation-\{accelerator}, for example workstation-H100.Number of full nodes to provision, each using all of its GPUs. Values above 1 bootstrap a Slurm cluster across the nodes. Mutually exclusive with
--gpu-count.See Slurm workstations for the cluster topology, verification steps, and how to launch distributed work.
Orchestrator bootstrapped across multi-node workstations.
slurm is the only supported value. Ignored for single-node workstations.Docker base image for every node (default:
nvidia/cuda:12.8.1-devel-ubuntu24.04). Multi-node workstations install Slurm with apt at startup, so use a Debian-based image.--enable-checkpointing
Mount checkpoint storage on the workstation. See Checkpoints.
Path inside the container to save checkpoints.
Checkpoint volume size in GiB.
Job ID to load the latest checkpoint from.
Name of the remote in
.trussrc to use.--tail
Stream workstation status and logs after launch.
Customizes logging.
--non-interactive
Disables interactive prompts, use in CI / automated execution contexts.
Examples
Launch a workstation with default settings:capacity
Show GPU capacity limits and current usage for the organization.Options
Name of the remote to use
Examples
View capacity for the default remote:Ignore files and folders
Create a.truss_ignore file in your project root to exclude files from upload. Uses .gitignore syntax.
.truss_ignore