Baseten home page
Search...
⌘K
Get started
Overview
Quick start
Concepts
Why Baseten
How Baseten works
Development
Concepts
Model APIs
Developing a model
Developing a Chain
Deployment
Concepts
Deployments
Environments
Resources
Autoscaling
Inference
Concepts
Call your model
Streaming
Async inference
Structured LLM output
Output formats
Integrations
Training
Overview
Getting started
Concepts
Management
Deploying checkpoints
Observability
Metrics
Status and health
Security
Secure model inference
Workspace access control
Best practices for API keys
Best practices for secrets
Exporting metrics
Tracing
Billing and usage
Troubleshooting
Deployments
Inference
Support
Return to Baseten
Baseten home page
Search...
⌘K
Ask AI
Support
Return to Baseten
Return to Baseten
Search...
Navigation
Security
Secure model inference
Documentation
Examples
Reference
Status
Documentation
Examples
Reference
Status
Security
Secure model inference
Keeping your models safe and private
Baseten maintains
SOC 2 Type II certification
and
HIPAA compliance
, with robust security measures beyond compliance.
Data privacy
Baseten does not store model inputs, outputs, or weights by default.
Model inputs/outputs
: Inputs for
async inference
are temporarily stored until processed. Outputs are never stored.
Model weights
: Loaded dynamically from sources like Hugging Face, GCS, or S3, moving directly to GPU memory.
Users can enable caching via Truss. Cached weights can be permanently deleted on request.
Postgres data tables
: Existing users may store data in Baseten’s hosted Postgres tables, which can be deleted anytime.
Baseten’s network accelerator optimizes model downloads.
Contact support
to disable it.
Workload security
Inference workloads are isolated to protect users and Baseten’s infrastructure.
Container security
:
No GPUs are shared across users.
Security tooling: Falco (Sysdig), Gatekeeper (Pod Security Policies).
Minimal privileges for workloads and nodes to limit incident impact.
Network security
:
Each customer has a dedicated Kubernetes namespace.
Isolation enforced via
Calico
.
Nodes run in a private subnet with firewall protections.
Pentesting
:
Extended pentesting by
RunSybil
(ex-OpenAI and CrowdStrike experts).
Malicious model deployments tested in a dedicated prod-like environment.
Self-hosted model inference
Baseten offers single-tenant environments and self-hosted deployments. The cloud version is recommended for ease of setup, cost efficiency, and elastic GPU access.
For self-hosting,
contact support
.
Was this page helpful?
Yes
No
Previous
Workspace access control
Workspaces use role-based access control (RBAC) with two roles:
Next
On this page
Data privacy
Workload security
Self-hosted model inference
Assistant
Responses are generated using AI and may contain mistakes.