A training job in Baseten progresses through several states from creation to completion. Understanding these states helps you monitor and manage your training jobs effectively.

Job States

StateDescriptionActiveTerminal
TRAINING_JOB_CREATEDInitial state when a job is first created. Baseten has received the training configuration and persisted it to our records.
TRAINING_JOB_DEPLOYINGBaseten is deploying the job, including provisioning compute resources and installing dependencies.
TRAINING_JOB_RUNNINGThe training code is actively executing.
TRAINING_JOB_COMPLETEDThe job has successfully finished execution. Any checkpoints or artifacts have been saved and uploaded.
TRAINING_JOB_DEPLOY_FAILEDThe job failed to deploy. This is likely due to a bad image or a resource allocation issue.
TRAINING_JOB_FAILEDThe job encountered an error and could not complete successfully. Check the logs for error details.
TRAINING_JOB_STOPPEDThe job was manually stopped by a user.

State Transitions

Jobs typically progress through states in the following order:

  1. TRAINING_JOB_CREATEDTRAINING_JOB_DEPLOYING: Automatic transition once resources are allocated
  2. TRAINING_JOB_DEPLOYINGTRAINING_JOB_RUNNING: Automatic transition once environment setup is complete
  3. TRAINING_JOB_RUNNINGTRAINING_JOB_COMPLETED: Automatic transition upon successful completion

A job may enter TRAINING_JOB_FAILED from any state if an error occurs. Similarly, TRAINING_JOB_STOPPED can be entered from any active state (DEPLOYING or RUNNING) when manually stopped.

You can monitor these state transitions using the CLI command:

truss train view # shows all active jobs
truss train view --job-id <your_job_id> # shows a specific job

Or track a specific job’s progress with:

truss train logs --job-id <your_job_id> --tail