Concepts

Baseten provides a flexible and scalable infrastructure for deploying and managing machine learning models. This page introduces key concepts - deployments, environments , resources , and autoscaling — that shape how models are served, tested, and optimized for performance and cost efficiency.

Deployments

Deployments define how models are served, scaled, and updated. They optimize resource use with autoscaling, scaling to zero, and controlled traffic shifts while ensuring minimal downtime. Deployments can be deactivated to pause resource usage or deleted permanently when no longer needed.

Environments

Environments group deployments, providing stable endpoints and autoscaling to manage model release cycles. They enable structured testing, controlled rollouts, and seamless transitions between staging and production. Each environment maintains its own settings and metrics, ensuring reliable and scalable deployments.

Resources

Resources define the hardware allocated to a model server, balancing performance and cost. Choosing the right instance type ensures efficient inference without unnecessary overhead. Resources can be set before deployment in Truss or adjusted later in the model dashboard to match workload demands.

Autoscaling

Autoscaling dynamically adjusts model resources to handle traffic fluctuations efficiently while minimizing costs. Deployments scale between a defined range of replicas based on demand, with settings for concurrency, scaling speed, and scale-to-zero for low-traffic models. Optimizations like network acceleration and cold start pods ensure fast response times even when scaling up from zero.

Get started

Development

Deployment

Inference

Training

Observability

Troubleshooting

Concepts

Deployments

Environments

Resources

Autoscaling

Get started

Concepts

Development

Deployment

Inference

Training

Observability

Troubleshooting

​Deployments

​Environments

​Resources

​Autoscaling

Deployments

Environments

Resources

Autoscaling