Skip to main content
A deployment in Baseten is a containerized instance of a model that serves inference requests via an API endpoint. Deployments exist independently but can be promoted to an environment for structured access and scaling. Baseten automatically wraps every deployment in a REST API. Once deployed, models can be queried with a simple HTTP request:
import requests

resp = requests.post(
    "https://model-{modelID}.api.baseten.co/deployment/[{deploymentID}]/predict",
    headers={"Authorization": "Api-Key YOUR_API_KEY"},
    json={'text': 'Hello my name is {MASK}'},
)

print(resp.json())
Learn more about running inference on your deployment

Development deployment

A development deployment is a mutable instance designed for rapid iteration. Create one with truss push --watch (for models) or truss chains push --watch (for Chains). It is always in the development state and cannot be renamed or detached from it. Key characteristics:
  • Live reload enables direct updates without redeployment.
  • Single replica, scales to zero when idle to conserve compute resources.
  • No autoscaling or zero-downtime updates.
  • Can be promoted to create a persistent deployment.
Once promoted, the development deployment transitions to a deployment and can optionally be promoted to an environment.

Environments and promotion

Environments provide logical isolation for managing deployments but are not required for a deployment to function. You can execute a deployment independently or promoted to an environment for controlled traffic allocation and scaling.
  • The production environment exists by default.
  • Custom environments (e.g., staging) can be created for specific workflows.
  • Promoting a deployment doesn’t modify its behavior, only its routing and lifecycle management.

Rolling deployments

Rolling deployments replace replicas incrementally when promoting a deployment to an environment. Instead of swapping all traffic at once, rolling deployments scale up the candidate, shift traffic proportionally, and scale down the previous deployment in controlled steps. You can pause, resume, cancel, or force-complete a rolling deployment at any point. See Rolling deployments for configuration, control actions, and status reference.

Canary deployments (deprecated)

Canary deployments are deprecated. Use rolling deployments for incremental traffic shifting with finer control over replica provisioning and rollback.
Canary deployments support incremental traffic shifting to a new deployment in 10 evenly distributed stages over a configurable time window. Enable or cancel canary rollouts via the UI or REST API.

Managing deployments

Naming deployments

By default, deployments of a model are named deployment-1, deployment-2, and so forth sequentially. You can instead give deployments custom names via two methods:
  1. While creating the deployment, using a command line argument in truss push.
  2. After creating the deployment, in the model management page within your Baseten dashboard.
Renaming deployments is purely aesthetic and does not affect model management API paths, which work via model and deployment IDs.

Deactivating a deployment

Deactivate a deployment to suspend inference execution while preserving configuration.
  • Remains visible in the dashboard.
  • Consumes no compute resources but can be reactivated anytime.
  • API requests return a 404 error while deactivated.
For demand-driven deployments, consider autoscaling with scale to zero.

Deleting deployments

You can permanently delete deployments, but production deployments must be replaced before deletion.
  • Deleted deployments are purged from the dashboard but retained in usage logs.
  • All associated compute resources are released.
  • API requests return a 404 error post-deletion.
Deletion is irreversible. Use deactivation if retention is required.