How to deploy checkpoints from Baseten Training jobs as usable models.
TrainingJob
has produced model checkpoints, you can deploy them as fully operational model endpoints.
To leverage deploying checkpoints, first ensure you have a TrainingJob
that’s running with a checkpointing_config
enabled.
$BT_CHECKPOINT_DIR
.
The contents of this directory are uploaded to Baseten’s storage and made immediately available for deployment.
(You can optionally specify a checkpoint_path
in your checkpointing_config
if you prefer to write to a specific directory).
To deploy your checkpoint(s) as a Deployment
, you can:
truss train deploy_checkpoints [--job-id <job_id>]
and follow the setup wizard.DeployCheckpointsConfig
class (this is helpful for small changes that aren’t provided by the wizard) and run truss train deploy_checkpoints --config <path_to_config_file>
.deploy_checkpoints
is run, truss
will construct a deployment config.yml
and store it on disk in a temporary directory. If you’d like to preserve or modify the resulting deployment config, you can copy paste it
into a permanent directory and customize it as needed.
This file defines the source of truth for the deployment and can be deployed independently via truss push
. See deployments for more details.
After successful deployment, your model will be deployed on Baseten, where you can run inference requests and evaluate performance. See Calling Your Model for more details.
To download the files you saved to the checkpointing directory, you can run truss train get_checkpoint_urls [--job-id=<job_id>]
to get a JSON file containing presigned URLs for each training job.
The JSON file contains the following structure:
load()
function