Deploy, manage, and scale machine learning models with Baseten
A deployment in Baseten is a containerized instance of a model that serves inference requests via an API endpoint. Deployments exist independently but can be promoted to an environment for structured access and scaling.Every deployment is automatically wrapped in a REST API. Once deployed, models can be queried with a simple HTTP request:
Copy
Ask AI
import requestsresp = requests.post( "https://model-{modelID}.api.baseten.co/deployment/[{deploymentID}]/predict", headers={"Authorization": "Api-Key YOUR_API_KEY"}, json={'text': 'Hello my name is {MASK}'},)print(resp.json())
A development deployment is a mutable instance designed for rapid iteration. It is always in the development state and cannot be renamed or detached from it.Key characteristics:
Live reload enables direct updates without redeployment.
Single replica, scales to zero when idle to conserve compute resources.
No autoscaling or zero-downtime updates.
Can be promoted to create a persistent deployment.
Once promoted, the development deployment transitions to a deployment and can optionally be promoted to an environment.
Environments provide logical isolation for managing deployments but are not required for a deployment to function. A deployment can be executed independently or promoted to an environment for controlled traffic allocation and scaling.
The production environment exists by default.
Custom environments (e.g., staging) can be created for specific workflows.
Promoting a deployment does not modify its behavior, only its routing and lifecycle management.
By default, deployments of a model are named deployment-1, deployment-2, and so forth sequentially. You can instead give deployments custom names via two methods: