Secure model inference

Baseten maintains SOC 2 Type II certification and HIPAA compliance, with robust security measures beyond compliance.

Data privacy

Baseten does not store model inputs, outputs, or weights by default.

Model inputs/outputs: Inputs for async inference are temporarily stored until processed. Outputs are never stored.
Model weights: Loaded dynamically from sources like Hugging Face, GCS, or S3, moving directly to GPU memory.
- Users can enable caching via Truss. Cached weights can be permanently deleted on request.
Postgres data tables: Existing users may store data in Baseten’s hosted Postgres tables, which can be deleted anytime.

Baseten’s network accelerator optimizes model downloads. Contact support to disable it.

Workload security

Inference workloads are isolated to protect users and Baseten’s infrastructure.

Container security:
- No GPUs are shared across users.
- Security tooling: Falco (Sysdig), Gatekeeper (Pod Security Policies).
- Minimal privileges for workloads and nodes to limit incident impact.
Network security:
- Each customer has a dedicated Kubernetes namespace.
- Isolation enforced via Calico.
- Nodes run in a private subnet with firewall protections.
Pentesting:
- Extended pentesting by RunSybil (ex-OpenAI and CrowdStrike experts).
- Malicious model deployments tested in a dedicated prod-like environment.

Self-hosted model inference

Baseten offers single-tenant environments and self-hosted deployments. The cloud version is recommended for ease of setup, cost efficiency, and elastic GPU access. For self-hosting, contact support.

Get started

Concepts

Development

Deployment

Inference

Training

Observability

Troubleshooting

Secure model inference

Data privacy

Workload security

Self-hosted model inference

Get started

Concepts

Development

Deployment

Inference

Training

Observability

Troubleshooting

​Data privacy

​Workload security

​Self-hosted model inference

Data privacy

Workload security

Self-hosted model inference