Links

Getting started

Integrate Baseten into your ML workflow
It takes careful thought and real work to integrate any platform into a critical workflow. This document aims to smooth that process by introducing Baseten's ML model deployment and operation platform, guiding you through incorporating it into your workflows, and introducing best practices for both MLOps and application engineering.
Let's get started!

Training models

Baseten is agnostic of model training workflows and will work with any model trained using any framework. Our customers train their models in a number of different ways: some train their models in Airflow, some in Databricks or Spark jobs, and some in Jupyter notebooks running in a cloud environment or even on their laptops. Baseten comes into the picture once a model has been trained and is ready to be packaged, deployed, and operationalized.

Deploying and invoking models

ML practitioners often struggle to go from a newly trained ML model to a microservice that serves that model scalably and observably. This struggle is due to a skill set mismatch: training models requires a different set of skills than the infrastructure knowledge required to package, deploy, serve, and monitor models. Baseten fills this gap.
Deploying a trained model to Baseten, in its simplest form, is a line of code:
baseten_model = baseten.deploy(
my_sklearn_pipeline, # or any other framework
model_name='Activity classification',
)
Once the model is deployed it gets its own landing page with references to all of the functionality that is provided out of the box.
Deployed model landing page

Model packaging

Underlying this deployment mechanism is our open-source model packaging system Truss. Using Truss directly affords you additional flexibility to define custom requirements (both Python and OS-level packages), inference-time pre- and post-processors, and local testing of packaged models. Creating a Truss, testing it locally, and deploying it to Baseten is a streamlined process with transparency into the underlying operations, logging, error detection (e.g. OOM), and next-best-action suggestions. This process is targeted at ML practitioners themselves. No need to learn Terraform, Docker, Kubernetes, or FastAPI.
Truss' role in model packaging
Read more about Truss in Towards Data Science or the Truss documentation and give it a star on GitHub to follow its development.

Model invocation

Invoking deployed models can be done in a few different ways depending on your use case, the simplest way of which is by making a REST API call:
curl -X POST https://app.baseten.co/model_versions/SOME_VERSION/predict
-H 'Authorization: Api-Key YOUR_API_KEY'
-d '{"prompt": "A dancing yeti"}'
You can also invoke a model by building a worklet. Worklets allow you to follow the separation of concerns principle by decoupling invocation patterns and business logic away from the models themselves. These different components represent separate concerns, change at different places and potentially by different stakeholders, and it's therefore critical for them to not be tightly coupled.
A worklet graph
For instance the code that loads data from Snowflake and transforms it can live in a worklet, and it can then call into the model without the model being explicitly aware of Snowflake. Worklets also allow you to schedule executions at a cadence and even call an ensemble of models in parallel without explicitly worrying about async/parallelization implementation.

Managing models

Once your model is deployed to Baseten you're automatically taking advantage of model version management. It allows you to have an unlimited number of versions for each model, each running independently as a separate microservice. You can decide to invoke a model version specifically, or you can invoke a model, which will invoke the version that's marked as primary. These capabilities allow you to experiment with multiple versions of the same model in production, run shadow models, etc.
Models deployed to Baseten can be tagged with a variety of information in order for the team to track the code and data that was used to train them, their hyperparameters, performance metrics, etc. This traceability is critical in debugging issues that might track back to the training step.

Model observability

Every model's metrics and logs are automatically captured and presented through the model health metrics and model logs UI. Behind the scenes Baseten uses Prometheus for capturing metrics and Grafana Loki for capturing logs. Optionally, the raw data can be fed to your internal systems if you so choose.

Model auto-scaling

Each version of each model horizontally scales automatically and independently. Scaling happens if the model is consuming resources (CPU/GPU/memory) above a certain threshold, or is slow to respond, causing requests to queue up. Underneath the hood Baseten uses Kubernetes, Istio, KServe, and Knative to implement this functionality and ensure SLAs.
Baseten product architecture

Data integrations

Your models can read from and write to a variety of data sources. Your Baseten workspace comes with hosted Postgres tables where you can set any schema and store data. You can also bring your own with data connections to resources like S3. Regardless of where you store your data, you can integrate it with your deployed ML models on Baseten without worrying about glue code.

Building user interfaces

Serving an ML model as an API is a great first step, but what if your model is intended for a non-technical end user like a customer service team looking for terms of service violations or a compliance team looking for fraud? Or maybe you want to build a public-facing application. From dashboards to demos, serving models to end users often requires UI.
An application page in the view builder
When you want to do more with your model than just invoke it, start by creating an application. Applications and models have a totally independent many-to-many relationship: you can have one without the other, a model can be used in multiple applications, an application can use multiple models.
An application can have one or more views. You can build views with the drag-and-drop view builder and create front-end logic with Python functions and bindings. The view builder provides over a dozen components which you can assemble on the canvas to quickly create a clean, functional user interface.
Building a user-facing application for real production use cases requires a proper DevOps loop. Users in a workspace on our Starter and Business plans have access to a draft environment for testing changes without affecting live sites and GitHub sync for version control. Combined with model versions, you can iterate confidently on ML-powered applications and only ship to prod when you're ready.
User-facing applications can take advantage of the same hosted Postgres tables and data connections that models do by running queries to both read from and write to data sources. And for more complex interactions, your data sources are fully available to the backend of the application and can be accessed with a code block in any worklet.
In summary, you can stateful user interfaces and full-stack applications with familiar tools like Python, SQL, and GitHub while never touching HTML, CSS, JavaScript, or any application build-and-deploy pipelines.

Security

Securing customers' data and the infrastructure that runs their code and models is paramount. Each customer's workload is isolated in a separate Kubernetes namespaces with strict network security policies on inter-namespace communication as well as Pod security policies enforced and monitored by Gatekeeper and Sysdig.
Baseten strongly encourages the best practice of keeping sensitive data away from code by providing multiple ways to store secrets securely.
Baseten is Soc 2 Type 1 certified and will be Type 2 certified in Q4 2022.

Support

Everyone is invited to email our main support channel — [email protected] — or use intercom widgets within the product. These channels are monitored during Pacific Time business hours by senior engineers, Baseten founders, and other in-house product experts.
Customers in the Business tier get access to a shared Slack channel where their questions are answered within 4 business hours (Pacific Time) with most questions answered within minutes.
Business tier customers are also assigned a dedicated forward-deployed engineer for personalized, 1-on-1 technical support as needed.
Contact us at [email protected] for additional information on anything covered in these docs!