Skip to main content
Rolling promotions replace replicas incrementally when promoting a deployment to an environment. Instead of swapping all traffic at once, rolling promotions scale up the candidate deployment, shift traffic proportionally, and scale down the previous deployment in controlled steps. Use rolling promotions when you need zero-downtime deployments with the ability to pause, cancel, or force-complete the promotion at any point.
Autoscaling is disabled for the entire duration of a rolling promotion. Replica counts do not adjust automatically until the promotion reaches a terminal status (SUCCEEDED, FAILED, or CANCELED). Use the replica_overhead_percent setting to pre-provision additional capacity before the promotion starts.

How rolling promotions work

A rolling promotion follows a repeating three-step cycle:
  1. Scale up candidate deployment replicas by the configured percentage.
  2. Shift traffic proportionally to match the new replica ratio.
  3. Scale down the previous deployment replicas by the same percentage.
This cycle repeats until all traffic and replicas run on the candidate deployment, at which point it becomes the active deployment in the environment.

Provisioning modes

Rolling promotions support two mutually exclusive provisioning modes. You must configure exactly one:
  • max_surge_percent: Scales up candidate replicas before scaling down previous replicas.
  • max_unavailable_percent: Scales down previous replicas before scaling up candidate replicas.
Both can’t be non-zero at the same time, and both can’t be zero at the same time.

Enabling rolling promotions

Enable rolling promotions on any environment by updating the environment’s promotion settings. Rolling promotions are disabled by default.
curl -X PATCH \
  https://api.baseten.co/v1/models/{model_id}/environments/production \
  -H "Authorization: Api-Key $BASETEN_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "promotion_settings": {
      "rolling_deploy": true,
      "rolling_deploy_config": {
        "max_surge_percent": 10,
        "max_unavailable_percent": 0,
        "stabilization_time_seconds": 60,
        "replica_overhead_percent": 0
      }
    }
  }'
Once rolling promotions are enabled, any subsequent promotion to the environment uses the rolling promotion workflow.

Configuration reference

Configure rolling promotions through the rolling_deploy_config object in the environment’s promotion_settings.
max_surge_percent
integer
default:"10"
Percentage of additional replicas to provision during each step. Set to 0 to use max unavailable mode instead.Range: 0–50
max_unavailable_percent
integer
default:"0"
Percentage of replicas that can be unavailable during each step. Set to 0 to use max surge mode instead.Range: 0–50
stabilization_time_seconds
integer
default:"0"
Seconds to wait after each traffic shift before proceeding to the next step. Use this to monitor metrics between steps.Range: 0–3600
replica_overhead_percent
integer
default:"0"
Percentage of additional replicas to pre-provision on the current deployment before the promotion starts. Compensates for autoscaling being disabled.Range: 0–500
Additional promotion settings configured at the promotion_settings level:
rolling_deploy
boolean
default:"false"
Enables rolling promotions for the environment.

Promotion statuses

The in_progress_promotion field on the environment detail endpoint tracks the current state of a promotion.
StatusDescription
RELEASINGCandidate deployment is building and initializing replicas.
RAMPING_UPScaling up candidate replicas and shifting traffic.
PAUSEDPromotion is paused at its current traffic split. Replicas stay at their current count.
RAMPING_DOWNGraceful cancel in progress. Traffic is shifting back to the previous deployment.
SUCCEEDEDPromotion completed. The candidate is now the active deployment. Autoscaling resumes.
FAILEDPromotion failed. Traffic remains on the previous deployment. Autoscaling resumes.
CANCELEDPromotion was canceled. Traffic returned to the previous deployment. Autoscaling resumes.
The in_progress_promotion object also includes percent_traffic_to_new_version, which reports the current percentage of traffic routed to the candidate deployment.

Promotion control actions

Pause

Pauses the promotion after the current step completes. Use this to inspect metrics or logs before proceeding.
curl -X POST \
  https://api.baseten.co/v1/models/{model_id}/environments/production/pause_promotion \
  -H "Authorization: Api-Key $BASETEN_API_KEY"

Resume

Resumes a paused promotion from where it left off.
curl -X POST \
  https://api.baseten.co/v1/models/{model_id}/environments/production/resume_promotion \
  -H "Authorization: Api-Key $BASETEN_API_KEY"

Cancel

Gracefully cancels the promotion. Traffic ramps back to the previous deployment and candidate replicas scale down.
curl -X POST \
  https://api.baseten.co/v1/models/{model_id}/environments/production/cancel_promotion \
  -H "Authorization: Api-Key $BASETEN_API_KEY"
Returns a status of CANCELED (instant cancel for non-rolling promotions) or RAMPING_DOWN (graceful rollback for rolling promotions).

Force cancel

Immediately cancels the promotion and returns all traffic to the previous deployment. Use this when you need to roll back without waiting for the graceful ramp-down.
Force canceling may cause brief service disruption if the previous deployment is under-provisioned.
curl -X POST \
  https://api.baseten.co/v1/models/{model_id}/environments/production/force_cancel_promotion \
  -H "Authorization: Api-Key $BASETEN_API_KEY"

Force roll forward

Immediately completes the promotion, shifting all traffic to the candidate deployment. This works even if the promotion is in the process of rolling back.
Force rolling forward may promote an under-provisioned deployment if the candidate has not finished scaling up.
curl -X POST \
  https://api.baseten.co/v1/models/{model_id}/environments/production/force_roll_forward_promotion \
  -H "Authorization: Api-Key $BASETEN_API_KEY"

Autoscaling during promotions

To compensate for autoscaling being disabled during promotions:
  • Set replica_overhead_percent to pre-provision the current deployment before the promotion starts. For example, a value of 50 adds 50% more replicas to the current deployment before any traffic shifts.
  • Set stabilization_time_seconds to add a wait period between steps, giving you time to monitor metrics before the next traffic shift.
  • Factor in expected traffic when setting your environment’s min_replica and max_replica before starting the promotion.
Autoscaling resumes automatically when the promotion reaches a terminal status: SUCCEEDED, FAILED, or CANCELED.

Promotion cleanup

After a promotion completes, the promotion_cleanup_strategy setting controls what happens to the previous deployment.
  • SCALE_TO_ZERO: Scales the previous deployment to zero replicas. It remains available for reactivation. This is the default.
  • KEEP: Leaves the previous deployment running at its current replica count.
  • DEACTIVATE: Deactivates the previous deployment. It stops serving traffic and releases all resources.
Set it alongside your other promotion settings:
curl -X PATCH \
  https://api.baseten.co/v1/models/{model_id}/environments/production \
  -H "Authorization: Api-Key $BASETEN_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "promotion_settings": {
      "promotion_cleanup_strategy": "DEACTIVATE"
    }
  }'