We take the security of your models and data seriously. Baseten maintains a SOC 2 Type II certification and HIPAA compliance, but we’re aware that these certifications don’t guarantee the security of the system. This doc provides a more specific look at Baseten’s security posture.

Data privacy

Baseten is not in the business of using customers’ data. We are in the business of providing ML inference infrastructure. We provide strong data privacy for your workloads.

Model inputs and outputs

By default, Baseten never stores models’ inputs or outputs.

Model inputs sent via async inference are stored until the async request has been processed by the model. Model outputs from async requests are never stored.

Baseten used to offer, and maintains for existing users, a hosted Postgres data table system. A user could store model inputs and outputs in these data tables, which means they’d be stored on Baseten. Baseten’s hosted Postgres data tables are secured with the same level of care as the rest of our infrastructure, and information in those tables can be permanently deleted by the user at any time.

Model weights

By default, Baseten does not store models’ weights.

By default, when a model is loaded, the model weights are simply downloaded from the source of truth (e.g. private HuggingFace repo, GCS, S3, etc) and moved from CPU memory to GPU memory (i.e. never stored on disk).

A user may explicitly instruct Baseten to store model weights using the caching mechanism in Truss. If a user stores weights on Baseten with this mechanism, they can request for those weights to be permanently erased. Baseten will process these requests for any models specified by the user within 1 business day.

For open-source models from Baseten’s model library, model weights are stored with this caching mechanism by default to speed up cold starts.

Additionally, Baseten uses a network accelerator that we developed to speed up model loads from common model artifact stores, including Hugging Face, S3, and GCS. Our accelerator employs byte range downloads in the background to maximize the parallelism of downloads. If you prefer to disable this network acceleration for your Baseten workspace, please contact our support team at support@baseten.co.

Workload security

Baseten runs ML inference workloads on users’ behalf. This necessitates creating the right level of isolation to protect users’ workloads from each other, and to protect Baseten’s core services from the users’ workloads. This is achieved through:

  • Container security via enforcing security policies and the principle of least privilege.
  • Network security policies including giving each customer their own Kubernetes namespace.
  • Keeping our infrastructure up-to-date with the latest security patches.

Container security

No two users’ model share the same GPU. In order to mitigate container related risks, we have made use of security tooling such as Falco (via Sysdig), pod security policies (via Gatekeeper), and running pods in a security context with minimal privileges. Furthermore, we ensure that the nodes themselves don’t have any privileges to affect other users’ workloads. Nodes have the lowest possible privileges within the Kubernetes cluster in order to minimize the blast radius of a security incident.

Network security policies

There exists a 1-1 relationship between a customer and a Kubernetes namespace. For each customer, all of their workloads live within that namespace.

We use this architecture to ensure isolation between customers’ workloads through network isolation enforced through Calico. Customers’ workloads are further isolated at the network level from the rest of Baseten’s infrastructure as the nodes run in a private subnet and are firewalled from public access.

Extended pentesting

While some pentesting is required for the SOC 2 certification, Baseten exceeds these requirements in both the scope of pentests and the access we give our testers.

We’ve contracted with ex-OpenAI and Crowdstrike security experts at RunSybil to perform extended pentesting including deploying malicious models on a dedicated prod-like Baseten environment, with the goal of breaking through the security measures described on this page.

Self-hosted model inference

We do offer:

  • Single-tenant environments.
  • Self-hosting Baseten within your own infrastructure.

Given the security measures we have already put in place, we recommend the cloud version of Baseten for most customers as it provides faster setup, lower cost, and elastic GPU availablity. However, if Baseten’s self hosted plan sounds right for your needs, please contact our support team at support@baseten.co.