Your first steps to creating and running training jobs on Baseten.
TrainingJob
using Baseten Training.
truss
package provides a python-native way for defining and running your training jobs.
jobs. The CLI provides a convenient way to deploy and manage your training jobs. Install or update it:
config.py
. This file uses the truss
package to specify all
aspects of your TrainingProject
and TrainingJob
.
A simple example of a config.py
file is shown below:
train.py
or a run.sh
), helper files, or
configuration files (e.g., accelerate config), place them in the same
directory as your config.py
or in subdirectories. When you push the training
job, truss
will package these artifacts and upload them. They will be copied
into the container at the root of the base image’s working directory..truss_ignore
file in root directory of your project.
In this file, you can add entries in a style similar to .gitignore
. Refer to the CLI reference for more details.SecretReference
(e.g.,
hf_access_token
, wandb_api_key
) are defined in your Baseten
workspace settings.TrainingJob
type, check out our SDK-reference.
start_commands
?run.sh
script is used. An example might look like this:
config.py
and any local artifacts are ready, you submit the training
job using the truss
CLI:
config.py
.config.py
.TrainingProject
specified in your config.TrainingJob
under that project.CheckpointingConfig
, Training Cache, and Multinode.