Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.baseten.co/llms.txt

Use this file to discover all available pages before exploring further.

The truss train command provides subcommands for managing the full training job lifecycle.
truss train [COMMAND] [OPTIONS]

init

Initialize a training project from templates or create an empty project.
truss train init [OPTIONS]

Options

}
--list-examples
List all available examples.
--target-directory
TEXT
--examples
TEXT
--log
humanfriendly | W | WARNING | I | INFO | D | DEBUG
default:"humanfriendly"
Customizes logging.
--non-interactive
Disables interactive prompts, use in CI / automated execution contexts.

Examples

Initialize a project from a template:
truss train init --examples qwen3-8b-lora-dpo-trl
Initialize multiple templates:
truss train init --examples qwen3-8b-lora-dpo-trl,qwen3-8b-lora-verl
List available templates:
truss train init --list-examples
Create an empty training project:
truss train init

push

Submit and run a training job.
truss train push [OPTIONS] CONFIG

Arguments

CONFIG
string
required
Path to the training configuration file (for example, config.py).

Options

}
--remote
TEXT
Remote to use.
--tail
Tail for status + logs after push.
--job-name
TEXT
Name of the training job.
--team
TEXT
Team name for the training project
The --team flag is only available if your organization has teams enabled. Contact us to enable teams, or see Teams for more information.
--interactive
on_startup | on_failure | on_demand
Interactive session trigger mode
--interactive-timeout-minutes
INTEGER
Interactive session timeout in minutes
--accelerator
TEXT
Accelerator type and count (e.g., H200:8)
--node-count
INTEGER
Number of compute nodes
--entrypoint
TEXT
Entrypoint command.
--priority
INTEGER
Job priority (higher values run first when capacity frees up).
--log
humanfriendly | W | WARNING | I | INFO | D | DEBUG
default:"humanfriendly"
Customizes logging.
--non-interactive
Disables interactive prompts, use in CI / automated execution contexts.

Examples

Submit a training job:
truss train push config.py
Submit and stream logs:
truss train push config.py --tail
Submit to a specific team:
truss train push config.py --team my-team-name
Submit with a custom job name:
truss train push config.py --job-name fine-tune-v1

logs

Fetch and stream logs from a training job.
truss train logs [OPTIONS]

Options

}
--remote
TEXT
Remote to use.
--project-id
TEXT
Project ID.
--project
TEXT
Project name or project id.
--job-id
TEXT
Job ID.
--tail
Tail for ongoing logs.
--log
humanfriendly | W | WARNING | I | INFO | D | DEBUG
default:"humanfriendly"
Customizes logging.
--non-interactive
Disables interactive prompts, use in CI / automated execution contexts.

Examples

Stream logs for a specific job:
truss train logs --job-id abc123 --tail
View logs for a job without streaming:
truss train logs --job-id abc123

metrics

View real-time metrics for a training job including CPU, GPU, and storage usage.
truss train metrics [OPTIONS]

Options

}
--project-id
TEXT
Project ID.
--project
TEXT
Project name or project id.
--job-id
TEXT
Job ID.
--remote
TEXT
Remote to use.
--log
humanfriendly | W | WARNING | I | INFO | D | DEBUG
default:"humanfriendly"
Customizes logging.
--non-interactive
Disables interactive prompts, use in CI / automated execution contexts.

Examples

View metrics for a specific job:
truss train metrics --job-id abc123

view

List training projects and jobs, or view details for a specific job. This command lists jobs in the TRAINING_JOB_PENDING state (waiting for GPU capacity) alongside other active jobs.
truss train view [OPTIONS]

Options

}
--project-id
TEXT
View training jobs for a project.
--project
TEXT
Project name or project id.
--job-id
TEXT
View a specific training job.
--remote
TEXT
Remote to use.
--log
humanfriendly | W | WARNING | I | INFO | D | DEBUG
default:"humanfriendly"
Customizes logging.
--non-interactive
Disables interactive prompts, use in CI / automated execution contexts.

Examples

List all training projects:
truss train view
View jobs in a specific project:
truss train view --project my-project
View details for a specific job:
truss train view --job-id abc123

stop

Stop a running or pending training job.
truss train stop [OPTIONS]

Options

}
--project-id
TEXT
Project ID.
--project
TEXT
Project name or project id.
--job-id
TEXT
Job ID.
--all
Stop all running jobs.
--remote
TEXT
Remote to use.
--log
humanfriendly | W | WARNING | I | INFO | D | DEBUG
default:"humanfriendly"
Customizes logging.
--non-interactive
Disables interactive prompts, use in CI / automated execution contexts.

Examples

Stop a specific job:
truss train stop --job-id abc123
Stop all running jobs:
truss train stop --all

recreate

Recreate an existing training job with the same configuration.
truss train recreate [OPTIONS]

Options

}
--job-id
TEXT
Job ID of Training Job to recreate
--remote
TEXT
Remote to use.
--tail
Tail for status + logs after recreation.
--log
humanfriendly | W | WARNING | I | INFO | D | DEBUG
default:"humanfriendly"
Customizes logging.
--non-interactive
Disables interactive prompts, use in CI / automated execution contexts.

Examples

Recreate a specific job:
truss train recreate --job-id abc123
Recreate and stream logs:
truss train recreate --job-id abc123 --tail

download

Download training job artifacts to your local machine.
truss train download [OPTIONS]

Options

}
--job-id
TEXT
Job ID.
--remote
TEXT
Remote to use.
--target-directory
DIRECTORY
Directory where the file should be downloaded. Defaults to current directory.
--no-unzip
Instructs truss to not unzip the folder upon download.
--log
humanfriendly | W | WARNING | I | INFO | D | DEBUG
default:"humanfriendly"
Customizes logging.
--non-interactive
Disables interactive prompts, use in CI / automated execution contexts.

Examples

Download artifacts to current directory:
truss train download --job-id abc123
Download to a specific directory:
truss train download --job-id abc123 --target-directory ./downloads
Download without extracting:
truss train download --job-id abc123 --no-unzip

deploy_checkpoints

Deploy a trained model checkpoint to Baseten’s inference platform.
truss train deploy_checkpoints [OPTIONS]

Options

}
--project-id
TEXT
Project ID.
--project
TEXT
Project name or project id.
--job-id
TEXT
Job ID.
--config
TEXT
path to a python file that defines a DeployCheckpointsConfig
--dry-run
Generate a truss config without deploying
--truss-config-output-dir
TEXT
Path to output the truss config to. If not provided, will output to truss_configs/model_version_namemodel_version_id or truss_configs/dry_runtimestamp if dry run.
--remote
TEXT
Remote to use.
--log
humanfriendly | W | WARNING | I | INFO | D | DEBUG
default:"humanfriendly"
Customizes logging.
--non-interactive
Disables interactive prompts, use in CI / automated execution contexts.

Examples

Deploy checkpoints interactively:
truss train deploy_checkpoints
Deploy checkpoints from a specific job:
truss train deploy_checkpoints --job-id abc123
Preview deployment without deploying:
truss train deploy_checkpoints --job-id abc123 --dry-run

Output

After a successful deployment, the command prints a clickable link to the deployment’s logs page.

get_checkpoint_urls

Get presigned URLs for checkpoint artifacts.
truss train get_checkpoint_urls [OPTIONS]

Options

}
--job-id
TEXT
Job ID.
--remote
TEXT
Remote to use.
--log
humanfriendly | W | WARNING | I | INFO | D | DEBUG
default:"humanfriendly"
Customizes logging.
--non-interactive
Disables interactive prompts, use in CI / automated execution contexts.

Examples

Get checkpoint URLs for a job:
truss train get_checkpoint_urls --job-id abc123

checkpoints list

List and interactively explore checkpoints for a training job.
truss train checkpoints list [OPTIONS]

Options

}
--remote
TEXT
Remote to use.
--project-id
TEXT
Project ID.
--project
TEXT
Project name or project id.
--job-id
TEXT
Job ID.
--checkpoint-name
TEXT
Jump directly into a specific checkpoint’s files.
--sort
checkpoint-id | size | created | type
default:"created"
Sort checkpoints by checkpoint-id, size, created date, or type.
--order
asc | desc
default:"asc"
Sort order: ascending or descending.
-o, --output-format
cli-table | csv | json
default:"cli-table"
Output format: cli-table (default), csv, or json.
--log
humanfriendly | W | WARNING | I | INFO | D | DEBUG
default:"humanfriendly"
Customizes logging.
--non-interactive
Disables interactive prompts, use in CI / automated execution contexts.

Interactive mode

When using the default cli-table format in an interactive terminal, the command launches a checkpoint explorer:
  1. Checkpoint picker: fuzzy-search and select a checkpoint from the list.
  2. File explorer: navigate the checkpoint’s directory tree. Press or Enter to open a directory or view a file. Press to go back. Press Ctrl-C to quit.
For .safetensors files, the explorer displays a tensor summary (layer names, dtypes, shapes, and parameter counts) instead of raw binary content. Text files display with syntax highlighting based on their file extension (for example, .json, .py, .yaml, .toml), falling back to plain text for unrecognized types.

Examples

List checkpoints for the most recent job:
truss train checkpoints list
List checkpoints for a specific job:
truss train checkpoints list --job-id abc123
Jump directly into a checkpoint’s files:
truss train checkpoints list --job-id abc123 --checkpoint-name ckpt-001
Export checkpoint list as JSON:
truss train checkpoints list --job-id abc123 --output-format json
Sort by size descending:
truss train checkpoints list --job-id abc123 --sort size --order desc

cache summarize

View a summary of the training cache for a project.
truss train cache summarize [OPTIONS] PROJECT

Arguments

PROJECT
string
required
Project name or project ID.

Options

}
--remote
TEXT
Remote to use.
--sort
filepath | size | modified | type | permissions
default:"filepath"
Sort files by filepath, size, modified date, file type, or permissions.
--order
asc | desc
default:"asc"
Sort order: ascending or descending.
-o, --output-format
cli-table | csv | json
default:"cli-table"
Output format: cli-table (default), csv, or json.
--log
humanfriendly | W | WARNING | I | INFO | D | DEBUG
default:"humanfriendly"
Customizes logging.
--non-interactive
Disables interactive prompts, use in CI / automated execution contexts.

Examples

View cache summary:
truss train cache summarize my-project
Sort by size descending:
truss train cache summarize my-project --sort size --order desc
Export as JSON:
truss train cache summarize my-project --output-format json

isession

View or update interactive session details for a training job, including auth codes and connection status.
truss train isession [OPTIONS]

Options

}
--job-id
TEXT
Job ID of the training job.
--remote
TEXT
Remote to use.
--update-timeout
INTEGER
Minutes to extend the session timeout by
--update-trigger
on_startup | on_failure | on_demand
Change the session trigger (cannot be changed on on_startup sessions)
--format
table | json
default:"table"
Output format (default: table)
--log
humanfriendly | W | WARNING | I | INFO | D | DEBUG
default:"humanfriendly"
Customizes logging.
--non-interactive
Disables interactive prompts, use in CI / automated execution contexts.

Examples

View session details for a job:
truss train isession --job-id abc123
Extend session timeout:
truss train isession --job-id abc123 --update-timeout 60
Output as JSON:
truss train isession --job-id abc123 --format json

update_session

Update the interactive session configuration on a running training job. At least one of --trigger or --timeout-minutes must be provided.
truss train update_session [OPTIONS] JOB_ID

Arguments

JOB_ID
string
required
Job ID of the training job to update.

Options

}
--trigger
on_startup | on_failure | on_demand
When to create the interactive session: ‘on_startup’ creates on job start, ‘on_failure’ creates on job failure, ‘on_demand’ allows manual session creation.
--timeout-minutes
INTEGER
Number of minutes before the interactive session times out.
--remote
TEXT
Remote to use.
--log
humanfriendly | W | WARNING | I | INFO | D | DEBUG
default:"humanfriendly"
Customizes logging.
--non-interactive
Disables interactive prompts, use in CI / automated execution contexts.

Examples

Change the session trigger:
truss train update_session abc123 --trigger on_startup
Update the session timeout:
truss train update_session abc123 --timeout-minutes 120
truss train update_session requires API support that may not be available in all environments. If you receive a 404 error, set the trigger mode at push time using --interactive on_startup or --interactive on_failure instead.

workstation

Spin up an SSH workstation on Baseten training infrastructure.
truss train workstation [OPTIONS]

Options

}
--accelerator
H100 | H200
default:"H100"
GPU accelerator type (default: H100).
--gpu-count
INTEGER RANGE
Number of GPUs (1-8, default: 1). Mutually exclusive with —node-count.
--project-id
TEXT
Project name (default: workstation-accelerator).
--node-count
INTEGER RANGE
Number of nodes (each with 8 GPUs). Mutually exclusive with —gpu-count.
--orchestrator
slurm
default:"slurm"
Multi-node orchestrator (default: slurm).
--image
TEXT
Custom Docker base image (default: nvidia/cuda:12.8.1-devel-ubuntu24.04).
--enable-checkpointing
Enable checkpoint storage.
--checkpoint-path
TEXT
Path inside the container to save checkpoints.
--checkpoint-volume-size
INTEGER
Checkpoint volume size in GiB.
--checkpoint-from-job
TEXT
Job ID to load the latest checkpoint from.
--remote
TEXT
Remote to use.
--tail
Tail for status + logs after push.
--log
humanfriendly | W | WARNING | I | INFO | D | DEBUG
default:"humanfriendly"
Customizes logging.
--non-interactive
Disables interactive prompts, use in CI / automated execution contexts.

Examples

Launch a workstation with default settings:
truss train workstation
Launch a multi-GPU workstation:
truss train workstation --accelerator H200 --gpu-count 4
Launch a workstation with a custom base image:
truss train workstation --image pytorch/pytorch:2.7.0-cuda12.8-cudnn9-runtime

capacity

Show GPU capacity limits and current usage for the organization.
truss train capacity [OPTIONS]

Options

}
--remote
TEXT
Name of the remote to use

Examples

View capacity for the default remote:
truss train capacity

Ignore files and folders

Create a .truss_ignore file in your project root to exclude files from upload. Uses .gitignore syntax.
.truss_ignore
# Python cache files
__pycache__/
*.pyc
*.pyo
*.pyd

# Type checking
.mypy_cache/

# Testing
.pytest_cache/

# Large data files
data/
*.bin