Lifecycle
Understanding the different states and transitions in a Baseten training job’s lifecycle.
A training job in Baseten progresses through several states from creation to completion. Understanding these states helps you monitor and manage your training jobs effectively.
Job States
State | Description | Active | Terminal |
---|---|---|---|
TRAINING_JOB_CREATED | Initial state when a job is first created. Baseten has received the training configuration and persisted it to our records. | ✅ | |
TRAINING_JOB_DEPLOYING | Baseten is deploying the job, including provisioning compute resources and installing dependencies. | ✅ | |
TRAINING_JOB_RUNNING | The training code is actively executing. | ✅ | |
TRAINING_JOB_COMPLETED | The job has successfully finished execution. Any checkpoints or artifacts have been saved and uploaded. | ✅ | |
TRAINING_JOB_DEPLOY_FAILED | The job failed to deploy. This is likely due to a bad image or a resource allocation issue. | ✅ | |
TRAINING_JOB_FAILED | The job encountered an error and could not complete successfully. Check the logs for error details. | ✅ | |
TRAINING_JOB_STOPPED | The job was manually stopped by a user. | ✅ |
State Transitions
Jobs typically progress through states in the following order:
TRAINING_JOB_CREATED
→TRAINING_JOB_DEPLOYING
: Automatic transition once resources are allocatedTRAINING_JOB_DEPLOYING
→TRAINING_JOB_RUNNING
: Automatic transition once environment setup is completeTRAINING_JOB_RUNNING
→TRAINING_JOB_COMPLETED
: Automatic transition upon successful completion
A job may enter TRAINING_JOB_FAILED
from any state if an error occurs. Similarly, TRAINING_JOB_STOPPED
can be entered from any active state (DEPLOYING
or RUNNING
) when manually stopped.
You can monitor these state transitions using the CLI command:
Or track a specific job’s progress with: