How Baseten works
Baseten is a platform for building, serving, and scaling AI models in production.
It supports multiple entry points depending on your workflow—whether you’re deploying a dedicated model, calling an open-source LLM via our Model API, or training from scratch.
At the core is the Baseten Inference Stack: performant model runtimes on top of Inference optimized infrastructure. Instead of managing infrastructure, scaling policies, and performance optimization, you can focus on building and iterating on your AI-powered applications.
Dedicated deployments
This is the primary workflow for teams deploying custom, open-source, or fine-tuned models with full control.
Baseten’s deployment stack is structured around four key pillars:
Development
Package any model using Truss, our open-source framework for defining dependencies, hardware, and custom logic—no Docker required. For more advanced use cases, build compound inference systems using Chains, orchestrating multiple models, APIs, and processing steps.

Developing a model
Package and deploy any AI/ML model as an API with Truss or a Custom Server.

Developing a Chain
Build multi-model workflows by chaining models, pre/post-processing, and business logic.
Deployment
Deploy models to dedicated, autoscaling infrastructure. Use Environments for controlled versioning, rollouts, and promotion between staging and production. Support includes scale-to-zero, canary deploys, and structured model management.

Inference
Serve synchronous, asynchronous, and streaming predictions with configurable execution controls. Optimize for latency, throughput, or cost depending on your application’s needs.

Observability
Monitor model health and performance with real-time metrics, logs, and detailed request traces. Export data to observability tools like Datadog or Prometheus. Debug behavior with full visibility into inputs, outputs, and errors.

This full-stack infrastructure, from packaging to observability, is powered by the Baseten Inference Stack: performant model runtimes, cross-cloud availability, and seamless developer workflows.
Model APIs
Model APIs offer a fast, reliable path to production for LLM-powered features. Use OpenAI-compatible endpoints to call performant open-source models like Llama 4, DeepSeek, and Qwen, with support for structured outputs and tool calling.
If your code already works with OpenAI’s SDKs, it’ll work with Baseten—no wrappers or rewrites required.
Training
Baseten Training provides scalable infrastructure for running containerized training jobs. Define your code, environment, and compute resources; manage checkpoints and logs; and transition seamlessly from training to deployment.
Organize work with TrainingProjects and track reproducible runs via TrainingJobs. Baseten supports any framework, from PyTorch to custom setups, with centralized artifact and job management.
Summary
- Use Dedicated Deployments to run and scale production-grade models with full control.
- Use Model APIs to quickly build LLM-powered features without managing infrastructure.
- Use Training to run reproducible training jobs and productionize your own models.
Each product is built on the same core: reliable infrastructure, strong developer ergonomics, and a focus on operational excellence.