For deploying weights from external cloud storage (GCS, S3, Azure), see Deploy from cloud storage.
Checkpoint reference
Therepo and revision fields in checkpoint_repository specify which training project and checkpoint to deploy.
repo: Your Baseten Training project name.revision: Which job and checkpoint to target. The following formats are supported:
revision value | Deploys |
|---|---|
<job_id>/<checkpoint_name> | A specific checkpoint from a specific job (for example, abc123/checkpoint-100) |
<job_id> | The latest checkpoint from a specific job |
latest or omitted | The latest checkpoint from the latest job |
LLM deployment
Use Engine-Builder-LLM or BIS-LLM to deploy a fine-tuned language model. Setbase_model to decoder:
config.yaml
Embeddings deployment
Use BEI to deploy a fine-tuned embedding or reranker model. Useencoder_bert for BERT-based models (sentence-transformers, rerankers, classifiers) or encoder for causal embedding models:
config.yaml
- No tensor parallelism: Omit
tensor_parallel_countor set it to1. - Fast tokenizer required: Your checkpoint must include a
tokenizer.jsonfile. Models using only the legacyvocab.txtformat aren’t supported. - Embedding model files: For sentence-transformer models, include
modules.jsonand1_Pooling/config.jsonin your checkpoint.
webserver_default_route field sets the inference endpoint path:
/v1/embeddings: For embedding models./rerank: For rerankers./predict: For classifiers./predict_tokens: For token-level prediction.
Related
- Engine-Builder-LLM configuration: Complete build and runtime options for LLMs.
- BEI reference configuration: Complete configuration for encoder models.
- Deploy from cloud storage: GCS, S3, and Azure deployment using
checkpoint_repository. - Baseten Training overview: Training jobs, checkpoints, and the full training workflow.
- Secrets management: Configure credentials for private storage.