Model deployment tutorial

An in-the-weeds look at every aspect of model deployment.

Stage 0: Setup

You'll need to do two things before deploying a model:
  1. 1.
    Install the Baseten Python client with pip install --upgrade baseten
  2. 2.
    Create an API key for your Baseten account
Of course, you'll also need a model to deploy!

Stage 1: Packaging your model

Truss is an open-source model packaging library developed by Baseten. Using Truss makes model deployment an interactive, configurable, reliable process and also lets you store, share, and version control your model however you'd like.
Truss, and by extension Baseten, currently supports Hugging Face, LightGBM, MLflow, PyTorch, scikit-learn, Tensorflow, and XGBoost out of the box. The process for deploying a model written in any of these frameworks is basically identical once the model is packaged as a Truss. Still, we have demo notebooks for deploying a model from each framework.
If your model wasn't created in a supported framework, see instructions for deploying a custom model. Or, open an issue to discuss adding support for your framework of choice to Truss.
We'll start out with a simple example using an XGBoost model. The deployment process starts from model, which is an in-memory object representing an ML model. Here's some sample training code:
import xgboost as xgb
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
X, y = make_classification(n_samples=100, n_informative=2, n_classes=2, n_features=6)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25)
train = xgb.DMatrix(X_train, y_train)
test = xgb.DMatrix(X_test, y_test)
params = { "learning_rate": 0.01, "max_depth": 3}
model = xgb.train(params, train, evals=[(train, "train"), (test, "validation")], num_boost_round=100, early_stopping_rounds=20)
The truss package is installed alongside baseten, so you can package your model without any additional pip install calls. Just run:
import truss
truss.create(model, target_directory="my-model")
You now have a Truss containing your model in the folder your-working-directory/my-model with this directory structure.
Technically, you don't have to manually package your model as a Truss. You can pass the in-memory model object directly into baseten.deploy() and it will be deployed.
However, we strongly recommend making a Truss as it gives you more configuration options and control when iterating on your deployed model.
For more on using Truss, check out the Truss docs on local development. You can use Truss to:

Stage 2: Deploying a draft

With your model packaged, it's time to deploy it. As it is packaged as a Truss, you don't need to be in the same script or notebook that you trained your model in; you can write and manage your model deployment code separately. To deploy your model, run:
import baseten
my_model = truss.load("my-model")
model_name="My wonderful model"
This will deploy your model in a draft state. Draft models differ from published models in three important ways:
  1. 1.
    Draft models are mutable and are not versioned. This means you can change your model as a draft over and over again without incrementing versions or changing version IDs.
  2. 2.
    Most updates are compatible with live reload, making testing changes between 10X and 100X faster.
  3. 3.
    Draft models are not suitable for production workloads.
Draft models do not count against your workspace billing limits and only use free resources. If your paid workspace plan includes GPU access, draft models will also have GPU access.

Live reload

Draft models have live reload, which means you can edit your Truss locally and re-deploy the updated model to Baseten in seconds, unlike the first deployment which took several minutes. This lets you test model changes quickly without setting up Docker locally to run a model server.
A live reload workflow
Right now, draft models support live reload for:
  • Changes to files and subdirectories in your Truss' model/ directory, such as your-truss/model/
  • Changes to your required Python packages in your-truss/config.yaml
  • Changes to your required system packages in your-truss/config.yaml
In the future, draft models will also support live reload for environment variables and model binaries.
After updating your Truss, just re-run:
my_model = truss.load("my-model")
model_name="My wonderful model" # model_name MUST stay the same between deployments of the same model
Your changes, if compatible with live reload, will be live within seconds. Otherwise, the model serving environment will be rebuilt and your changes will be live in a few minutes.

Draft model limitations

Draft models are intended for development and testing, not production workloads. As such, draft models have a few limitations:
  1. 1.
    Draft model resources are automatically released after 1 hour of inactivity. Thanks to this, draft models are free to deploy (they do not count against your workspace's model limit) but have a cold start time on invocation that is unsuited for production use.
  2. 2.
    Requests to draft models may fail if they are sent while the model is updating.
  3. 3.
    Draft models cannot scale beyond the base resource config, so they cannot handle much traffic.

Stage 3: Publishing your model

When your model is ready for production, it's time to publish your model.
Once it is no longer a draft model, your model is assigned a version and does count against your workspace model limit for billing purposes and does consume resources if configured to do so.

Publishing via the Python client

You can publish your model via the Python client. Just add publish=True to your baseten.deploy() invocation:
my_model = truss.load("my-model")
model_name="My wonderful model", # model_name MUST stay the same between deployments of the same model
Your model will rebuild onto production infrastructure and you will receive an email when the process is complete.
You can use publish=True during your initial deployment in stage 2 to skip the draft model stage.

Publishing via the Baseten UI

You can also publish your draft from the Baseten UI. From the model's page, click on the three-dot menu next to "Draft" and select "Publish model version."
Publish model version
Your model will rebuild onto production infrastructure and you will receive an email when the process is complete.
If you do not want to publish your draft model to production, you can deactivate or delete it just like any other model version.

Configuring model resources

Your published model may need additional resources to handle production traffic. For more on resource management see the resource management docs.

Updating an existing model

If you want to deploy a new version of an existing model, follow the instructions in the model versions doc.