Welcome to Baseten!
Bring your models. We'll handle the rest.
Baseten is the machine learning infrastructure platform to serve models of any size and modality and do so performantly, scalably, and affordably for production use cases.
All you need to get started deploying models with Baseten is an API key.
To kick off your exploration, every new account comes with free model resource credits.
You can deploy any model on Baseten:
- If you want to deploy a popular foundation model as-is, it's just two clicks away in the model library.
- For full control over the deployment and serving of an off-the-shelf model, clone a Truss of a foundation model from Baseten's GitHub.
- To deploy your own model, package it with Truss, an open-source model packaging library developed by Baseten.
To deploy your model, load your model and call
baseten.deploy()
:import baseten
import truss
my_model = truss.load("~/path/to/my-model")
baseten.deploy(
my_model,
model_name="My wonderful model"
)
Get started deploying models:
Deployed models can be invoked via API call:
curl -X POST https://app.baseten.co/models/MODEL_ID/predict \
-H 'Authorization: Api-Key YOUR_API_KEY' \
-d 'MODEL_INPUT'
But there's so much more you can do with your models on Baseten. You can build microservices via worklets or entire applications to test and use your models. And Baseten's hosted Postgres tables and data connections make it seamless to integrate your data with your model.
Securing users' data and the infrastructure that runs their code and models is paramount. Each user's workload is isolated in a separate Kubernetes namespaces with strict network security policies on inter-namespace communication as well as Pod security policies enforced and monitored by Gatekeeper and Sysdig.
Baseten strongly encourages the best practice of keeping sensitive data away from code by providing multiple ways to store secrets securely.
We developed a network accelerator to speed up model loads from common model artifact stores, including HuggingFace, CloudFront, S3, and OpenAI. Our accelerator employs byte range downloads in the background to maximize the parallelism of downloads. If you prefer to disable this network acceleration for your Baseten workspace, please contact our support team at [email protected].
Everyone is invited to email our main support channel — [email protected] — or send us a message by clicking the Support button within the product. These channels are monitored during Pacific Time business hours by senior engineers, Baseten founders, and other in-house product experts.
Workspaces on the Enterprise plan get access to a dedicated forward deployed engineer in a shared Slack channel where their questions are answered within 4 business hours (Pacific Time) with most questions answered within minutes.
Last modified 8d ago