When a Baseten Training job completes, Baseten automatically saves your checkpoints to Baseten storage. You can deploy any of them to an inference engine without downloading or re-uploading anything. Engine-Builder-LLM, BEI, and BIS-LLM all support this workflow.Documentation Index
Fetch the complete documentation index at: https://docs.baseten.co/llms.txt
Use this file to discover all available pages before exploring further.
For deploying weights from external cloud storage (GCS, S3, Azure), see Deploy from cloud storage.
Checkpoint reference
Therepo and revision fields in checkpoint_repository specify which training project and checkpoint to deploy.
repo: Your Baseten Training project name.revision: Which job and checkpoint to target. The following formats are supported:
revision value | Deploys |
|---|---|
<job_id>/<checkpoint_name> | A specific checkpoint from a specific job (for example, abc123/checkpoint-100) |
<job_id> | The latest checkpoint from a specific job |
latest or omitted | The latest checkpoint from the latest job |
LLM deployment
Use Engine-Builder-LLM or BIS-LLM to deploy a fine-tuned language model. Setbase_model to decoder:
config.yaml
Embeddings deployment
Use BEI to deploy a fine-tuned embedding or reranker model. Useencoder_bert for BERT-based models (sentence-transformers, rerankers, classifiers) or encoder for causal embedding models:
config.yaml
- No tensor parallelism: Omit
tensor_parallel_countor set it to1. - Fast tokenizer required: Your checkpoint must include a
tokenizer.jsonfile. Models using only the legacyvocab.txtformat aren’t supported. - Embedding model files: For sentence-transformer models, include
modules.jsonand1_Pooling/config.jsonin your checkpoint.
webserver_default_route field sets the inference endpoint path:
/v1/embeddings: For embedding models./rerank: For rerankers./predict: For classifiers./predict_tokens: For token-level prediction.
Related
- Engine-Builder-LLM configuration: Complete build and runtime options for LLMs.
- BEI reference configuration: Complete configuration for encoder models.
- Deploy from cloud storage: GCS, S3, and Azure deployment using
checkpoint_repository. - Baseten Training overview: Training jobs, checkpoints, and the full training workflow.
- Secrets management: Configure credentials for private storage.