When cold starts happen
Cold starts occur in two scenarios:- Scale-from-zero: When a deployment with zero active replicas receives its first request.
- Scaling events: When traffic increases and the autoscaler adds new replicas.
What contributes to cold start time
Cold start duration depends on several factors:| Factor | Impact |
|---|---|
| Model loading | Loading model weights (10s–100s of GBs) — typically the dominant factor |
| Container pull | Downloading Docker image layers |
| Initialization | Running your model’s setup code |
Minimizing cold starts
Keep replicas warm
Setmin_replica to always have at least one replica ready to serve requests. This eliminates cold starts for the first request but increases cost.
min_replica ≥ 2 so one replica can fail during maintenance without causing cold starts.
Pre-warm before expected traffic
For predictable traffic spikes, increase min replicas before the expected load:Use longer scale-down delay
A longer scale-down delay keeps replicas warm during temporary traffic dips:Platform optimizations
Baseten automatically applies several optimizations to reduce cold start times: Baseten Delivery Network (Recommended): Theweights configuration optimizes cold starts by mirroring weights to Baseten’s infrastructure and caching them close to your model pods. See Baseten Delivery Network (BDN) for full configuration options.
Network accelerator (Legacy): Parallelized byte-range downloads speed up model loading from Hugging Face, S3, GCS, and R2.
Image streaming: Optimized images stream into nodes, allowing model loading to begin before the full download completes:
The tradeoff
Cold starts create a fundamental tradeoff between cost and latency:| Approach | Cost | Latency |
|---|---|---|
Scale to zero (min_replica: 0) | Lower: no cost when idle | Higher: first request waits for cold start |
Always on (min_replica: ≥1) | Higher: pay for idle replicas | Lower: no cold starts |
Next steps
- Autoscaling: Configure min replicas and scale-down delay.
- Traffic patterns: Pre-warming strategies for different traffic types.
- Troubleshooting: Diagnose cold start issues.