Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.baseten.co/llms.txt

Use this file to discover all available pages before exploring further.

Baseten runs trained models as production endpoints. You package the model with Truss, our open-source CLI, then push to Baseten for deployment, autoscaling, and observability.

Pick a starting point

Two paths cover most deployments:
  • Config-only: Most popular open-source LLMs deploy from a single config.yaml. Baseten compiles and serves them on TensorRT-LLM with an OpenAI-compatible API. Start with Build your first model.
  • Custom Python: When you need preprocessing, postprocessing, or an unsupported architecture, write a Model class with __init__, load, and predict. Start with Custom model code.
For pre-built containers like vLLM, SGLang, or Triton, see Custom Docker containers.

Anatomy of a Truss

A Truss is the packaged unit you push to Baseten. It contains:
  • config.yaml: runtime environment, dependencies, GPU, and deployment settings. See Configuration.
  • model/model.py (custom code only): the Model class with load and predict. See Implementation.
  • data/, packages/, and weights: optional assets. See Data and storage and BDN for large weights.

Development loop

truss push --watch creates a development deployment that live-patches code changes in seconds and scales to one replica. truss push (no flags) creates a published deployment, autoscaled and ready for production traffic. Iterate in development mode, then promote with truss push --promote when the model is ready. See Deploy and iterate for the full workflow and Environments for promotion semantics.