Data privacy
Baseten does not store model inputs, outputs, or weights by default.- Model inputs/outputs: Inputs for async inference are temporarily stored until processed. Outputs are never stored.
- Model weights: Loaded dynamically from sources like Hugging Face, GCS, or S3, moving directly to GPU memory.
- Users can enable caching via Truss. You can permanently delete cached weights on request.
- Postgres data tables: Existing users may store data in Baseten’s hosted Postgres tables, which can be deleted anytime.
Viewing your compliance policy
If Baseten has set a compliance policy for your account, the policy appears in your Organization and Team settings under the General tab, and on the model environment detail view. The policy shows the boundaries your inference workloads run within:- Framework: the compliance programs your workloads are restricted to.
- Region: the geographic regions where your workloads can run.
Workload security
Baseten isolates inference workloads to protect users and Baseten’s infrastructure.- Container security:
- Baseten never shares GPUs across users.
- Security tooling: Falco (Sysdig), Gatekeeper (Pod Security Policies).
- Minimal privileges for workloads and nodes to limit incident impact.
- Network security:
- Each customer has a dedicated Kubernetes namespace.
- Isolation enforced via Calico.
- Nodes run in a private subnet with firewall protections.
- Pentesting:
- Extended pentesting by RunSybil (ex-OpenAI and CrowdStrike experts).
- Malicious model deployments tested in a dedicated prod-like environment.