Deployments and environments
Deployment lifecycle on Baseten
There are two special concepts related to models on Baseten: deployments and environments. Both have different features to match their role in the model lifecycle.
Feature | Development deployment | Production environment | Custom environments |
---|---|---|---|
API: Deployment ID | ☑️ | ☑️ | ☑️ |
API: Model ID | - | ☑️ | - |
Live reload | ☑️ | - | - |
Scale to zero | ☑️ | ☑️ | ☑️ |
Full autoscaling | - | ☑️ | ☑️ |
Zero-downtime updates | - | ☑️ | ☑️ |
Deactivate | ☑️ | ☑️ | ☑️ |
Delete | ☑️ | - | ☑️ |
What is a development deployment?
A development deployment is designed to make it easier for you to iterate on your model. As such, development deployments have three special properties:
- Development deployments have live reload so you can patch changes onto the model server while it runs.
- Development deployments don’t have access to full autoscaling. They have a maximum of one replica and always scale to zero when not in use.
- Development deployments do not guarantee zero-downtime updates. A development deployment may be updated at any time, which may cause active requests to fail.
- Development deployments are always named development and cannot be renamed.
Live reload lets you use the Truss CLI to patch changes onto your running model server, rather than waiting for an entirely new deployment.
What is an environment?
Environments encapsulate deployments, enabling you to manage your model’s release cycles. By providing a stable URL and autoscaling settings, environments allow you to create repeatable release processes for your model, ensuring its quality, stability, and scalability before it reaches end users.
Let’s say you’ve made some changes to a model, and you want to better understand the efficacy of its outputs without changing any behavior in your user-facing app. To take advantage of environments, you can create an environment with a custom name (e.g., “staging”) and promote a candidate deployment to that environment. This deployment now receives any requests you make to the “staging” environment. Now, you can verify the quality of your changes before promoting the deployment to production.
Some common methods of verifying the quality of the deployment:
- Tests/Evals
- Manual testing in pre-production environment
- Canarying/Gradual rollout
- Shadow serving traffic
When promoting a deployment to an environment, including production, there are a few key differences:
- The environment uses the environment-specific endpoint
- The environment has full access to autoscaling settings.
- Traffic ramp up can be enabled on the environment.
- Metrics can be exported for each environment.
A production environment is just like any other environment, with a couple differences:
- A production environment is designated for production use; you can’t create additional custom environments with the name “production.”
- A production environment cannot be deleted (unless you delete the entire model).
Environments API
Each model’s environment comes with its own:
- Predict endpoint
- Async inference endpoint
- Set of management endpoints for:
Promotion
Any deployment can be promoted to any environment, whether it is a development deployment, a published deployment, or a deployment that’s already in an environment.
- Ensure that you have created an environment. The production environment will exist by default for every model.
- Deployments can be promoted from the UI or via the REST API.
Promoting a deployment to an environment
Promoting the development deployment to an environment triggers a three-step process:
- A new deployment is created, with a new deployment ID and name.
- The new deployment is allocated resources and started up.
- Once active, the new deployment becomes associated with the environment, replacing any previous deployment.
- If there was no previous deployment, the new deployment is created with standard autoscaling settings.
- If there was a previous deployment, the new deployment is created with the same autoscaling settings as the previous deployment. The previous deployment is demoted but keeps its ID, autoscaling settings, and is by default scaled to zero.
Promoting the development deployment to an environment does not change the development deployment’s ID, autoscaling settings, or activity status. You can continue to iterate on the development deployment with live reload.
Promoting another published deployment to an environment
When promoting an already published deployment to an environment, keep the following in mind:
- The published deployment’s autoscaling settings will be updated to match the previous deployment in the environment.
- If the deployment is inactive, you must activate it and wait for it to start up before promoting it.
The previous deployment is demoted and joins other deployments in the deployment list, but keeps its deployment ID, autoscaling settings, and is by default scaled to zero.
Deploying directly to an environment
You can deploy a model directly to an environment, skipping the development stage and starting a promotion to any existing environment, by adding --environment
to truss push
:
There can only be one active promotion per environment at any given time.
Canary deployments
Canary deployments allow you to ramp up traffic to existing environments, ensuring a smooth transition with minimal disruption to ongoing traffic.
Once this is enabled and a new deployment is promoted, traffic is shifted in 10 evenly-spaced stages over a configurable time window. This gradual shift allows the deployment to scale in response to real-time demand, guided by your autoscaling settings, thus maintaining stability for existing users.
The traffic ramp-up can be enabled via the UI or REST API. If you cancel it, incoming traffic will revert to your existing deployment.
Check out our launch blog for more information.
Deactivating deployments
Any active deployment, including those in environments, can be deactivated.
- Deactivated deployments remain visible in the model dashboard.
- Deactivated deployments do not consume model resources.
- Requests to a deactivated deployment’s endpoints will result in a 404 error.
- A deactivated deployment can be manually activated at any time from the model dashboard.
If you’re regularly activating or deactivating deployments in response to traffic, consider using autoscaling with scale to zero instead.
Deleting deployments and environments
Any deployment and environment of a model can be deleted except for production. To delete the production deployment, first promote a different deployment to production (or delete the entire model).
- Deleted deployments are removed from the model dashboard but will appear in your billing and usage dashboard.
- Deleted deployments do not consume model resources.
- Requests to a deleted deployment’s endpoints will result in a 404 error.
- Deleting a deployment is a permanent action and cannot be undone.
If you aren’t completely certain about deleting a deployment, consider deactivating it instead until you’re ready to delete.