We take the security of your models and data seriously. Baseten maintains a SOC 2 Type II certification and HIPAA compliance, but weā€™re aware that these certifications donā€™t guarantee the security of the system. This doc provides a more specific look at Basetenā€™s security posture.

Data privacy

Baseten is not in the business of using customersā€™ data. We are in the business of providing ML inference infrastructure. We provide strong data privacy for your workloads.

Model inputs and outputs

By default, Baseten never stores modelsā€™ inputs or outputs.

Model inputs sent via async inference are stored until the async request has been processed by the model. Model outputs from async requests are never stored.

Baseten used to offer, and maintains for existing users, a hosted Postgres data table system. A user could store model inputs and outputs in these data tables, which means theyā€™d be stored on Baseten. Basetenā€™s hosted Postgres data tables are secured with the same level of care as the rest of our infrastructure, and information in those tables can be permanently deleted by the user at any time.

Model weights

By default, Baseten does not store modelsā€™ weights.

By default, when a model is loaded, the model weights are simply downloaded from the source of truth (e.g. private HuggingFace repo, GCS, S3, etc) and moved from CPU memory to GPU memory (i.e. never stored on disk).

A user may explicitly instruct Baseten to store model weights using the caching mechanism in Truss. If a user stores weights on Baseten with this mechanism, they can request for those weights to be permanently erased. Baseten will process these requests for any models specified by the user within 1 business day.

For open-source models from Basetenā€™s model library, model weights are stored with this caching mechanism by default to speed up cold starts.

Additionally, Baseten uses a network accelerator that we developed to speed up model loads from common model artifact stores, including Hugging Face, S3, and GCS. Our accelerator employs byte range downloads in the background to maximize the parallelism of downloads. If you prefer to disable this network acceleration for your Baseten workspace, please contact our support team at support@baseten.co.

Workload security

Baseten runs ML inference workloads on usersā€™ behalf. This necessitates creating the right level of isolation to protect usersā€™ workloads from each other, and to protect Basetenā€™s core services from the usersā€™ workloads. This is achieved through:

  • Container security via enforcing security policies and the principle of least privilege.
  • Network security policies including giving each customer their own Kubernetes namespace.
  • Keeping our infrastructure up-to-date with the latest security patches.

Container security

No two usersā€™ model share the same GPU. In order to mitigate container related risks, we have made use of security tooling such as Falco (via Sysdig), pod security policies (via Gatekeeper), and running pods in a security context with minimal privileges. Furthermore, we ensure that the nodes themselves donā€™t have any privileges to affect other usersā€™ workloads. Nodes have the lowest possible privileges within the Kubernetes cluster in order to minimize the blast radius of a security incident.

Network security policies

There exists a 1-1 relationship between a customer and a Kubernetes namespace. For each customer, all of their workloads live within that namespace.

We use this architecture to ensure isolation between customersā€™ workloads through network isolation enforced through Calico. Customersā€™ workloads are further isolated at the network level from the rest of Basetenā€™s infrastructure as the nodes run in a private subnet and are firewalled from public access.

Extended pentesting

While some pentesting is required for the SOC 2 certification, Baseten exceeds these requirements in both the scope of pentests and the access we give our testers.

Weā€™ve contracted with ex-OpenAI and Crowdstrike security experts at RunSybil to perform extended pentesting including deploying malicious models on a dedicated prod-like Baseten environment, with the goal of breaking through the security measures described on this page.

Self-hosted model inference

We do offer:

  • Single-tenant environments.
  • Self-hosting Baseten within your own infrastructure.

Given the security measures we have already put in place, we recommend the cloud version of Baseten for most customers as it provides faster setup, lower cost, and elastic GPU availablity. However, if Basetenā€™s self hosted plan sounds right for your needs, please contact our support team at support@baseten.co.