Welcome to Baseten!
Bring your models. We'll handle the rest.
Baseten is the machine learning infrastructure platform to serve models of any size and modality and do so performantly, scalably, and affordably for production use cases.
All you need to get started deploying models with Baseten is an API key.
import baseten
baseten.login("YOUR_API_KEY")
To kick off your exploration, every new account comes with free model resource credits.
You can deploy any model on Baseten:
- If you want to deploy a popular foundation model as-is, it's just two clicks away in the model library.
- For full control over the deployment and serving of an off-the-shelf model, clone a Truss of a foundation model from Baseten's GitHub.
- To deploy your own model, package it with Truss, an open-source model packaging library developed by Baseten.
Example deployment: WizardLM
Get the packaged model by cloning it from GitHub:
git clone https://github.com/basetenlabs/wizardlm-truss.git
In a Python script, load your model and call
baseten.deploy()
:import baseten
import truss
my_model = truss.load("wizardlm-truss")
baseten.deploy(my_model, model_name="WizardLM")
Get your model version ID from the console output:
...
BasetenDeployedModel<
model_version_id=qzk76xq
name=WizardLM
>
Then, invoke your model:
model = baseten.deployed_model_version_id('MODEL_VERSION_ID')
model.predict({"prompt": "What is the difference between a wizard and a sorcerer?"})
Deployed models can also be invoked via API call:
curl -X POST https://app.baseten.co/model_versions/MODEL_VERSION_ID/predict \
-H 'Authorization: Api-Key YOUR_API_KEY' \
-d '{"prompt": "What is the difference between a wizard and a sorcerer?"}'
Get started deploying models:
Everyone is invited to email our main support channel — [email protected] — or send us a message by clicking the Support button within the product. These channels are monitored during Pacific Time business hours by senior engineers, Baseten founders, and other in-house product experts.
Workspaces on the Pro plan get access to a dedicated forward deployed engineer in a shared Slack channel where their questions are answered within 4 business hours (Pacific Time) with most questions answered within minutes.
Last modified 15d ago