Management
How to monitor, manage, and interact with your Baseten Training projects and jobs.
Once you have submitted training jobs, Baseten provides tools to manage your TrainingProject
s and individual TrainingJob
s. You can use the CLI or the API to manage your jobs.
TrainingProject
Management
-
Listing Projects: To view all your training projects:
This command will list all
TrainingProject
s you have access to, typically showing their names and IDs. Additionally, this command will show all active jobs. -
Viewing Jobs within a Project: To see all jobs associated with a specific project, use its
project-id
(obtained when creating the project or fromtruss train view
):
TrainingJob
Management
After submitting a job with truss train push config.py
, you receive a project_id
and job_id
.
-
Listing Jobs: As shown above, you can list all jobs within a project using:
This will typically show job IDs, statuses, creation times, etc.
-
Checking Status and Retrieving Logs: To view the logs for a specific job, you can tail them in real-time or fetch existing logs.
- To view logs for the most recently submitted job in the current context (e.g., if you just pushed a job from your current terminal directory):
- To view logs for a specific job using its
job-id
:Add--tail
to follow the logs live.
- To view logs for the most recently submitted job in the current context (e.g., if you just pushed a job from your current terminal directory):
-
Understanding Job Statuses: The
truss train view
andtruss train logs
commands will help you track which status a job is in. For more on the job lifecycle, see the Lifecycle page. -
Stopping a
TrainingJob
: If you need to stop a running job, use thestop
command with the job’s project ID and job ID:This will transition the job to the
TRAINING_JOB_STOPPED
state. -
Understanding Job Outputs & Checkpoints:
- The primary outputs of a successful
TrainingJob
are model checkpoints (if checkpointing is enabled and configured). - These checkpoints are stored by Baseten. Refer to the Checkpointing section in Core Concepts for how
CheckpointingConfig
works. - When you are ready to deploy a model, you will specify which checkpoints to use. The
model_name
you assign during deployment (viaDeployCheckpointsConfig
) becomes the identifier for this trained model version derived from your specific job’s checkpoints. - You can see the available checkpoints for a job via the Training API.
- The primary outputs of a successful