Baseten runs trained models as production endpoints. You package the model with Truss, our open-source CLI, then push to Baseten for deployment, autoscaling, and observability.Documentation Index
Fetch the complete documentation index at: https://docs.baseten.co/llms.txt
Use this file to discover all available pages before exploring further.
Pick a starting point
Two paths cover most deployments:- Config-only: Most popular open-source LLMs deploy from a single
config.yaml. Baseten compiles and serves them on TensorRT-LLM with an OpenAI-compatible API. Start with Build your first model. - Custom Python: When you need preprocessing, postprocessing, or an unsupported architecture, write a
Modelclass with__init__,load, andpredict. Start with Custom model code.
Anatomy of a Truss
A Truss is the packaged unit you push to Baseten. It contains:config.yaml: runtime environment, dependencies, GPU, and deployment settings. See Configuration.model/model.py(custom code only): theModelclass withloadandpredict. See Implementation.data/,packages/, and weights: optional assets. See Data and storage and BDN for large weights.

Development loop
truss push --watch creates a development deployment that live-patches code changes in seconds and scales to one replica. truss push (no flags) creates a published deployment, autoscaled and ready for production traffic.
Iterate in development mode, then promote with truss push --promote when the model is ready. See Deploy and iterate for the full workflow and Environments for promotion semantics.