How Baseten works

It supports multiple entry points depending on your workflow—whether you’re deploying a dedicated model, calling an open-source LLM via our Model API, or training from scratch. At the core is the Baseten Inference Stack: performant model runtimes on top of Inference optimized infrastructure. Instead of managing infrastructure, scaling policies, and performance optimization, you can focus on building and iterating on your AI-powered applications.

Dedicated deployments

This is the primary workflow for teams deploying custom, open-source, or fine-tuned models with full control. Baseten’s deployment stack is structured around four key pillars:

Development

Package any model using Truss, our open-source framework for defining dependencies, hardware, and custom logic—no Docker required. For more advanced use cases, build compound inference systems using Chains, orchestrating multiple models, APIs, and processing steps.

Developing a model

Package and deploy any AI/ML model as an API with Truss or a Custom Server.

Developing a Chain

Build multi-model workflows by chaining models, pre/post-processing, and business logic.

Deployment

Deploy models to dedicated, autoscaling infrastructure. Use Environments for controlled versioning, rollouts, and promotion between staging and production. Support includes scale-to-zero, canary deploys, and structured model management.

Inference

Serve synchronous, asynchronous, and streaming predictions with configurable execution controls. Optimize for latency, throughput, or cost depending on your application’s needs.

Observability

Monitor model health and performance with real-time metrics, logs, and detailed request traces. Export data to observability tools like Datadog or Prometheus. Debug behavior with full visibility into inputs, outputs, and errors.

This full-stack infrastructure, from packaging to observability, is powered by the Baseten Inference Stack: performant model runtimes, cross-cloud availability, and seamless developer workflows.

Model APIs

Model APIs offer a fast, reliable path to production for LLM-powered features. Use OpenAI-compatible endpoints to call performant open-source models like Llama 4, DeepSeek, and Qwen, with support for structured outputs and tool calling. If your code already works with OpenAI’s SDKs, it’ll work with Baseten—no wrappers or rewrites required.

Training

Baseten Training provides scalable infrastructure for running containerized training jobs. Define your code, environment, and compute resources; manage checkpoints and logs; and transition seamlessly from training to deployment. Organize work with TrainingProjects and track reproducible runs via TrainingJobs. Baseten supports any framework, from PyTorch to custom setups, with centralized artifact and job management.

Summary

Use Dedicated Deployments to run and scale production-grade models with full control.
Use Model APIs to quickly build LLM-powered features without managing infrastructure.
Use Training to run reproducible training jobs and productionize your own models.

Each product is built on the same core: reliable infrastructure, strong developer ergonomics, and a focus on operational excellence.

Get started

Concepts

Development

Deployment

Inference

Training

Observability

Troubleshooting

How Baseten works

Dedicated deployments

Development

Developing a model

Developing a Chain

Deployment

Inference

Observability

Model APIs

Training

Summary

Get started

Concepts

Development

Deployment

Inference

Training

Observability

Troubleshooting

​Dedicated deployments

​Development

Developing a model

Developing a Chain

​Deployment

​Inference

​Observability

​Model APIs

​Training

​Summary

Dedicated deployments

Development

Deployment

Inference

Observability

Model APIs

Training

Summary