Skip to main content
Baseten makes it easy to go from a trained machine learning model to a fully-deployed, production-ready API. You’ll use Truss, our open-source model packaging tool, to containerize your model code and configuration, and ship it to Baseten for deployment, testing, and scaling.

What does it mean to develop a model?

In Baseten, developing a model means:
  1. Packaging your model code and weights: Wrap your trained model into a structured project that includes your inference logic and dependencies.
  2. Configuring the model environment: Define everything needed to run your model: Python packages, system dependencies, and secrets.
  3. Deploying and iterating quickly: Push your model to Baseten and iterate with live edits using truss push --watch.
Once your model works the way you want, you can promote it to production, ready for live traffic.

Development flow on Baseten

Here’s what the typical model development loop looks like:
  1. Initialize a new model project with truss init.
  2. Add your model logic to a Python class (model.py), specifying how to load and run inference.
  3. Configure dependencies in a YAML or Python config.
  4. Deploy the model with truss push for a published deployment, or truss push --watch for a development deployment with live-reloading.
  5. Iterate and test using truss watch to sync changes to your dev deployment as you tune the model.
  6. Promote to production with truss push when you’re ready to scale.
When you push a model with Truss, it runs in a standardized container on Baseten without needing Docker installed locally. Truss also gives you a fast developer loop and a consistent way to configure and serve models.

What is Truss?

Truss is the tool you use to:
  • Scaffold a new model project
  • Serve models locally or in the cloud
  • Package your code, config, and model files
  • Push to Baseten for deployment
You can think of it as the developer toolkit for building and managing model servers, built specifically for machine learning workflows. With Truss, you can create a containerized model server and define everything about how your model runs: Python and system packages, GPU settings, environment variables, and custom inference logic. Truss gives you a fast, reproducible dev loop where you test changes in a remote environment that mirrors production. Truss is flexible enough to support a wide range of ML stacks, including:
  • Model frameworks like PyTorch, transformers, and diffusers
  • Inference engines like TensorRT-LLM, SGLang, vLLM
  • Serving technologies like Triton
  • Any package installable with pip or apt
You’ll use Truss throughout this guide, but the focus stays on how you develop models, not just how Truss works.

From model to server: the key components

When you develop a model on Baseten, you define:
  • A Model class: Defines how your model is loaded (in load) and how inference runs (in predict).
  • A configuration file (config.yaml or Python config): Defines the runtime environment, dependencies, and deployment settings.
  • Optional extra assets, like model weights, secrets, or external packages.
Together, these components form a Truss: a portable, reproducible package that you deploy, version, and scale on Baseten.

Development vs. published deployments

By default, truss push creates a published deployment, which is stable, autoscaled, and ready for live traffic.
  • Published deployment (truss push): Stable, autoscaled, and ready for live traffic. Does not support live-reloading.
  • Development deployment (truss push --watch): Meant for iteration and testing. Supports live-reloading for quick feedback loops and scales to one replica only.
Use development mode to build and test, then deploy a published version with truss push when you’re satisfied.