Model deployment
Deploy from model library
Deploy a foundation model in two clicks
Baseten’s model library is the fastest way to get started with dozens of popular foundation models like Llama 2, Stable Diffusion and Whisper.
Deploy a library model
From the Baseten UI
With the Truss CLI
You can deploy any model library model from the explore page in just a couple of clicks.
Models deployed from the web UI are published and trusted by default, so they have full access to autoscaling options and workspace secrets.
Model library inference guide
Models in the model library follow a set of standards for consistency and predictable performance.
Configuration standards
- Models use the least expensive instance type that will reliably run the model. For example, Stable Diffusion XL uses an A10-based instance even though an A100-based instance is faster. You can adjust the instance type in the model dashboard after deployment.
- Models like Llama 2 that rely on authentication with Hugging Face require the secret
hf_access_token
to be set in your account secrets.
Input standards
- All models take a dictionary as input.
- For models like LLMs and Stable Diffusion that take a text input, the key for that input is
prompt
. - Transformers-based models like LLMs support arguments from the transformers generationConfig object.
Output standards
- Any generated image or audio will be returned using base64 encoding.
- All LLMs support streaming. To stream a response, pass
"stream": true
in the input dictionary.