Secure model inference
Keeping your models safe and private
We take the security of your models and data seriously. Baseten maintains a SOC 2 Type II certification and HIPAA compliance, but weāre aware that these certifications donāt guarantee the security of the system. This doc provides a more specific look at Basetenās security posture.
Data privacy
Baseten is not in the business of using customersā data. We are in the business of providing ML inference infrastructure. We provide strong data privacy for your workloads.
Model inputs and outputs
By default, Baseten never stores modelsā inputs or outputs.
Model inputs sent via async inference are stored until the async request has been processed by the model. Model outputs from async requests are never stored.
Baseten used to offer, and maintains for existing users, a hosted Postgres data table system. A user could store model inputs and outputs in these data tables, which means theyād be stored on Baseten. Basetenās hosted Postgres data tables are secured with the same level of care as the rest of our infrastructure, and information in those tables can be permanently deleted by the user at any time.
Model weights
By default, Baseten does not store modelsā weights.
By default, when a model is loaded, the model weights are simply downloaded from the source of truth (e.g. private HuggingFace repo, GCS, S3, etc) and moved from CPU memory to GPU memory (i.e. never stored on disk).
A user may explicitly instruct Baseten to store model weights using the caching mechanism in Truss. If a user stores weights on Baseten with this mechanism, they can request for those weights to be permanently erased. Baseten will process these requests for any models specified by the user within 1 business day.
For open-source models from Basetenās model library, model weights are stored with this caching mechanism by default to speed up cold starts.
Additionally, Baseten uses a network accelerator that we developed to speed up model loads from common model artifact stores, including Hugging Face, S3, and GCS. Our accelerator employs byte range downloads in the background to maximize the parallelism of downloads. If you prefer to disable this network acceleration for your Baseten workspace, please contact our support team at support@baseten.co.
Workload security
Baseten runs ML inference workloads on usersā behalf. This necessitates creating the right level of isolation to protect usersā workloads from each other, and to protect Basetenās core services from the usersā workloads. This is achieved through:
- Container security via enforcing security policies and the principle of least privilege.
- Network security policies including giving each customer their own Kubernetes namespace.
- Keeping our infrastructure up-to-date with the latest security patches.
Container security
No two usersā model share the same GPU. In order to mitigate container related risks, we have made use of security tooling such as Falco (via Sysdig), pod security policies (via Gatekeeper), and running pods in a security context with minimal privileges. Furthermore, we ensure that the nodes themselves donāt have any privileges to affect other usersā workloads. Nodes have the lowest possible privileges within the Kubernetes cluster in order to minimize the blast radius of a security incident.
Network security policies
There exists a 1-1 relationship between a customer and a Kubernetes namespace. For each customer, all of their workloads live within that namespace.
We use this architecture to ensure isolation between customersā workloads through network isolation enforced through Calico. Customersā workloads are further isolated at the network level from the rest of Basetenās infrastructure as the nodes run in a private subnet and are firewalled from public access.
Extended pentesting
While some pentesting is required for the SOC 2 certification, Baseten exceeds these requirements in both the scope of pentests and the access we give our testers.
Weāve contracted with ex-OpenAI and Crowdstrike security experts at RunSybil to perform extended pentesting including deploying malicious models on a dedicated prod-like Baseten environment, with the goal of breaking through the security measures described on this page.
Self-hosted model inference
We do offer:
- Single-tenant environments.
- Self-hosting Baseten within your own infrastructure.
Given the security measures we have already put in place, we recommend the cloud version of Baseten for most customers as it provides faster setup, lower cost, and elastic GPU availablity. However, if Basetenās self hosted plan sounds right for your needs, please contact our support team at support@baseten.co.