A/B Deployments
Overview
A/B deployments are a release process where a new deployment (the “candidate” or “B”) is gradually released to a subset of users alongside the existing deployment (the “production” or “A”). This controlled rollout helps validate the quality and performance of the new deployment before wider release.
Benefits
- Controlled Validation: Direct a small percentage of users to experience a new model, collecting both product and performance metrics before full rollout
- Traffic Management: Mitigate risks associated with traffic spikes to models that haven’t warmed up to full capacity
- Resource Optimization: Control the pace and capacity allocated to new models, helping manage resources during large migrations that would otherwise require nearly double the existing capacity
- Safety Net: Maintain an active fallback deployment in case issues arise with the new model
A/B rollouts provide a more manual but controlled approach compared to canary deployments.
Implementation Requirements
To implement A/B deployments with Baseten, you’ll need to create routing logic on your side that directs traffic to either deployment A (production) or B (candidate) based on a configurable threshold. Below are two approaches to implementing this routing logic.
Approach 1: Random Request Routing
This approach randomly routes each individual request to either deployment A or B based on a probability threshold.
Approach 2: Session-Based Routing
For use cases requiring consistency across multiple requests, session-based routing ensures all related requests go to the same deployment. This approach is particularly valuable for:
- Contextual Continuity: Essential for multi-turn conversations or chat sessions where context is built up over time
- User Experience Consistency: Prevents jarring changes in model behavior or quality mid-session
- Quality Evaluation: Simplifies A/B testing by ensuring entire user journeys can be attributed to a specific model version
- Context Window Efficiency: For models that maintain conversation history, keeping users on the same deployment prevents redundant context transmission and increases the chance of a cache hit
Session-Based Routing Considerations
- Distribution Variance: The actual traffic distribution may not exactly match your target percentage, especially when some sessions contain more “events” than other sessions.
- Consistency: Session IDs should consistently route to the same deployment even as you adjust the threshold, ensuring a session changes deployments at most once during the migration.
- Tracking: Record which deployment handled each request or session to facilitate analysis and troubleshooting.
- Entropy of Session ID: ensure the ID used for your session has sufficiently high entropy to ensure uniform distribution of requests across models. Standard UUIDs will work.
Baseten Release Process with A/B Deployments
Once you’ve added A/B support to your code, follow these steps to roll out a new deployment.
Prerequisite: You should already have an existing model with a production deployment on Baseten.
- Deploy the new model version: Ensure it doesn’t have environment-specific dependencies that would cause behavioral differences from your production model (unless intended).
- Configure your routing URLs:
- Deployment A (Production): Use the production environment alias
https://{model-id}.api.baseten.co/environments/production/predict
- Deployment B (Candidate): Use the specific deployment ID
https://{model-id}.api.baseten.co/deployment/{deployment-id}/predict
- Scale appropriately: Pre-scale deployment B to handle its expected traffic share. For example, if testing with 10% of traffic, set the minimum replica count to at least 10% of deployment A’s capacity.
- Activate A/B routing: Enable your routing logic and begin directing traffic to both deployments.
- Monitor performance: Track both quantitative metrics (through the Baseten Metrics dashboard) and qualitative results from your product (success rates, error rates, etc.).
- Manage capacity: As traffic shifts to deployment B, deployment A should auto-scale down if properly configured. This will free up capacity.
- Complete the migration: Repeat the process as many times as you’d like, gradually increasing the % of traffic routed to your candidate deployment. When ready to fully switch, simply promote deployment B to production within Baseten. At this point:
Both URLs will point to the same production environment (one via the production alias, and the other via the deployment ID)
Update the minimum/maximum replica count for your new production deployment to match your previous production environment values (or just let the system do this automatically during promotion)
Configure your routing to direct 100% of traffic to the production deployment using the production alias
By following this guide, you can safely and effectively roll out new model deployments while minimizing risk and optimizing resource usage.