truss train command provides subcommands for managing the full training job lifecycle.
Universal options
The following options are available for alltruss train commands:
--help: Show help message and exit.--non-interactive: Disable interactive prompts (for CI/automated environments).--remote TEXT: Name of the remote in.trussrc.--log [humanfriendly|w|warning|i|info|d|debug]: Customize logging verbosity.
init
Initialize a training project from templates or create an empty project.Options
Template name or comma-separated list of templates to initialize. See the ML Cookbook for available examples.
Directory to initialize the project in. Defaults to
truss-train-init/ for an empty project, or the current directory when --examples is provided.--list-examples
List all available example templates.
Examples
Initialize a project from a template:push
Submit and run a training job.Arguments
Path to the training configuration file (for example,
config.py).Options
--tail
Stream status and logs after submitting the job.
Name for the training job.
Team name for the training project. If not specified, Truss infers the team or prompts for selection.
The
--team flag is only available if your organization has teams enabled. Contact us to enable teams, or see Teams for more information.Trigger mode for an interactive session on the training job. The session uses rSSH or SSH based on
session_provider in your config. Options: on_startup, on_failure, on_demand.Session timeout in minutes.
Override the training job’s entrypoint command. Use
"bash" with --interactive for a clean container to experiment in before running anything.GPU type and count in
TYPE:COUNT format (for example, H200:8).Number of compute nodes for the training job.
Job priority. Higher values run first when capacity frees up.
Examples
Submit a training job:logs
Fetch and stream logs from a training job.Options
Job ID to fetch logs from.
Project name or project ID.
Project ID.
--tail
Continuously stream new logs.
Examples
Stream logs for a specific job:metrics
View real-time metrics for a training job including CPU, GPU, and storage usage.Options
Job ID to fetch metrics from.
Project name or project ID.
Project ID.
Examples
View metrics for a specific job:view
List training projects and jobs, or view details for a specific job. This command lists jobs in theTRAINING_JOB_PENDING state (waiting for GPU capacity) alongside other active jobs.
Options
View details for a specific training job.
View jobs for a specific project (name or ID).
View jobs for a specific project ID.
Examples
List all training projects:stop
Stop a running or pending training job.Options
Job ID to stop.
Project name or project ID.
Project ID.
--all
Stop all running jobs. Prompts for confirmation.
Examples
Stop a specific job:recreate
Recreate an existing training job with the same configuration.Options
Job ID of the training job to recreate. If not provided, defaults to the last created job.
--tail
Stream status and logs after recreating the job.
Examples
Recreate a specific job:download
Download training job artifacts to your local machine.Options
Job ID to download artifacts from.
Directory to download files to.
--no-unzip
Keep the compressed archive without extracting.
Examples
Download artifacts to current directory:deploy_checkpoints
Deploy a trained model checkpoint to Baseten’s inference platform.Options
Job ID containing the checkpoints to deploy.
Project name or project ID.
Project ID.
Path to a Python file defining a
DeployCheckpointsConfig.--dry-run
Generate a Truss config without deploying. Useful for previewing the deployment configuration.
Path to output the generated Truss config. Defaults to
truss_configs/<model_version_name>_<model_version_id>.Examples
Deploy checkpoints interactively:get_checkpoint_urls
Get presigned URLs for checkpoint artifacts.Options
Job ID containing the checkpoints.
Examples
Get checkpoint URLs for a job:checkpoints list
List and interactively explore checkpoints for a training job.Options
Job ID to list checkpoints for. If omitted, defaults to the most recently created job. If multiple jobs exist and no
--project-id or --project is provided, defaults to the most recently created job across all projects and prints its ID as a warning.Project name or project ID.
Project ID.
Jump directly into a specific checkpoint’s file explorer.
Sort checkpoints by column. Options:
checkpoint-id, size, created, type.Sort order:
asc (ascending) or desc (descending).Output format:
cli-table (interactive), csv, or json. Alias: -o.Interactive mode
When using the defaultcli-table format in an interactive terminal, the command launches a checkpoint explorer:
- Checkpoint picker: fuzzy-search and select a checkpoint from the list.
- File explorer: navigate the checkpoint’s directory tree. Press
→orEnterto open a directory or view a file. Press←to go back. PressCtrl-Cto quit.
.safetensors files, the explorer displays a tensor summary (layer names, dtypes, shapes, and parameter counts) instead of raw binary content. Text files display with syntax highlighting based on their file extension (for example, .json, .py, .yaml, .toml), falling back to plain text for unrecognized types.
Examples
List checkpoints for the most recent job:cache summarize
View a summary of the training cache for a project.Arguments
Project name or project ID.
Options
Sort files by column. Options:
filepath, size, modified, type, permissions.Sort order:
asc (ascending) or desc (descending).Output format:
cli-table, csv, or json. Alias: -o.Examples
View cache summary:isession
View or update interactive session details for a training job, including auth codes and connection status.Options
Job ID to view interactive session details for.
Extend the session timeout by this many minutes.
Change the session trigger. Options:
on_startup, on_failure, on_demand. Cannot be changed on on_startup sessions.Output format:
table or json.Examples
View session details for a job:update_session
Update the interactive session configuration on a running training job. At least one of--trigger or --timeout-minutes must be provided.
Arguments
Job ID of the training job to update.
Options
New trigger mode for the session. Options:
on_startup, on_failure, on_demand.Number of minutes before the interactive session times out.
Examples
Change the session trigger:capacity
Show GPU capacity limits and current usage for the organization.Options
Name of the remote to use.
Examples
View capacity for the default remote:Ignore files and folders
Create a.truss_ignore file in your project root to exclude files from upload. Uses .gitignore syntax.
.truss_ignore