Deploy Training Jobs

push

truss train push [OPTIONS] CONFIG

Deploys and runs a training job.

  • CONFIG: Path to a training configuration file.

Options:

  • --remote (TEXT): Name of the remote in .trussrc to push to.
  • --tail: Tail status and logs after pushing the training job.
  • --help: Show this message and exit.

Monitor Training Jobs

logs

truss train logs [OPTIONS]

Fetch and display logs for a training job.

Options:

  • --remote (TEXT): Name of the remote in .trussrc.
  • --job-id (TEXT): Job ID to fetch logs from.
  • --tail: Continuously stream new logs.
  • --help: Show this message and exit.

metrics

truss train metrics [OPTIONS]

Get metrics for a training job.

Options:

  • --remote (TEXT): Name of the remote in .trussrc.
  • --job-id (TEXT): Job ID to fetch metrics from.
  • --help: Show this message and exit.

view

truss train view [OPTIONS]

List and view training jobs.

Options:

  • --remote (TEXT): Name of the remote in .trussrc.
  • --project-id (TEXT): View training jobs for a specific project.
  • --job-id (TEXT): View details of a specific training job.
  • --help: Show this message and exit.

Manage Training Jobs

stop

truss train stop [OPTIONS]

Stop a running training job.

Options:

  • --remote (TEXT): Name of the remote in .trussrc.
  • --project-id (TEXT): Specify the project to stop a training job from.
  • --job-id (TEXT): ID of the job to stop.
  • --all: Stop all running jobs.
  • --help: Show this message and exit.

deploy_checkpoints

truss train deploy_checkpoints [OPTIONS]

Deploy model checkpoints from a training job.

Options:

  • --remote (TEXT): Name of the remote in .trussrc.
  • --project-id (TEXT): Project ID containing the checkpoints.
  • --job-id (TEXT): Job ID containing the checkpoints.
  • --config (TEXT): Path to a Python file defining a DeployCheckpointsConfig.
  • --dry-run: Generate a truss config without deploying.
  • --help: Show this message and exit.