Model deployment tutorial
An in-the-weeds look at every aspect of model deployment.
You'll need to do two things before deploying a model:
- 1.
- 2.
Of course, you'll also need a model to deploy!
Truss is an open-source model packaging library developed by Baseten. Using Truss makes model deployment an interactive, configurable, reliable process and also lets you store, share, and version control your model however you'd like.
Truss, and by extension Baseten, currently supports Hugging Face, LightGBM, MLflow, PyTorch, scikit-learn, Tensorflow, and XGBoost out of the box. The process for deploying a model written in any of these frameworks is basically identical once the model is packaged as a Truss. Still, we have demo notebooks for deploying a model from each framework.
If your model wasn't created in a supported framework, see instructions for deploying a custom model. Or, open an issue to discuss adding support for your framework of choice to Truss.
We'll start out with a simple example using an XGBoost model. The deployment process starts from
model
, which is an in-memory object representing an ML model. Here's some sample training code:import xgboost as xgb
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
X, y = make_classification(n_samples=100, n_informative=2, n_classes=2, n_features=6)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25)
train = xgb.DMatrix(X_train, y_train)
test = xgb.DMatrix(X_test, y_test)
params = { "learning_rate": 0.01, "max_depth": 3}
model = xgb.train(params, train, evals=[(train, "train"), (test, "validation")], num_boost_round=100, early_stopping_rounds=20)
The
truss
package is installed alongside baseten
, so you can package your model without any additional pip install
calls. Just run:import truss
truss.create(model, target_directory="my-model")
You now have a Truss containing your model in the folder
your-working-directory/my-model
with this directory structure.Technically, you don't have to manually package your model as a Truss. You can pass the in-memory model object directly into
baseten.deploy()
and it will be deployed.However, we strongly recommend making a Truss as it gives you more configuration options and control when iterating on your deployed model.
With your model packaged, it's time to deploy it. As it is packaged as a Truss, you don't need to be in the same script or notebook that you trained your model in; you can write and manage your model deployment code separately. To deploy your model, run:
import baseten
my_model = truss.load("my-model")
baseten.login("YOUR_API_KEY")
baseten.deploy(
my_model,
model_name="My wonderful model"
)
This will deploy your model in a draft state. Draft models differ from published models in three important ways:
- 1.Draft models are mutable and are not versioned. This means you can change your model as a draft over and over again without incrementing versions or changing version IDs.
- 2.Most updates are compatible with live reload, making testing changes between 10X and 100X faster.
- 3.Draft models are not suitable for production workloads.
Draft models do not count against your workspace billing limits and only use free resources. If your paid workspace plan includes GPU access, draft models will also have GPU access.
Draft models have live reload, which means you can edit your Truss locally and re-deploy the updated model to Baseten in seconds, unlike the first deployment which took several minutes. This lets you test model changes quickly without setting up Docker locally to run a model server.

A live reload workflow
Right now, draft models support live reload for:
- Changes to files and subdirectories in your Truss'
model/
directory, such asyour-truss/model/model.py
- Changes to your required Python packages in
your-truss/config.yaml
- Changes to your required system packages in
your-truss/config.yaml
In the future, draft models will also support live reload for environment variables and model binaries.
After updating your Truss, just re-run:
my_model = truss.load("my-model")
baseten.deploy(
my_model,
model_name="My wonderful model" # model_name MUST stay the same between deployments of the same model
)
Your changes, if compatible with live reload, will be live within seconds. Otherwise, the model serving environment will be rebuilt and your changes will be live in a few minutes.
Draft models are intended for development and testing, not production workloads. As such, draft models have a few limitations:
- 1.Draft model resources are automatically released after 1 hour of inactivity. Thanks to this, draft models are free to deploy (they do not count against your workspace's model limit) but have a cold start time on invocation that is unsuited for production use.
- 2.Requests to draft models may fail if they are sent while the model is updating.
- 3.Draft models cannot scale beyond the base resource config, so they cannot handle much traffic.
When your model is ready for production, it's time to publish your model.
Once it is no longer a draft model, your model is assigned a version and does count against your workspace model limit for billing purposes and does consume resources if configured to do so.
You can publish your model via the Python client. Just add
publish=True
to your baseten.deploy()
invocation:my_model = truss.load("my-model")
baseten.deploy(
my_model,
model_name="My wonderful model", # model_name MUST stay the same between deployments of the same model
publish=True
)
Your model will rebuild onto production infrastructure and you will receive an email when the process is complete.
You can use
publish=True
during your initial deployment in stage 2 to skip the draft model stage.You can also publish your draft from the Baseten UI. From the model's page, click on the three-dot menu next to "Draft" and select "Publish model version."

Your model will rebuild onto production infrastructure and you will receive an email when the process is complete.
If you do not want to publish your draft model to production, you can deactivate or delete it just like any other model version.
Your published model may need additional resources to handle production traffic. For more on resource management see the resource management docs.
If you want to deploy a new version of an existing model, follow the instructions in the model versions doc.
Last modified 15d ago