Skip to main content
Baseten home page
Search...
⌘K
Get started
Overview
Quick start
Concepts
Why Baseten
How Baseten works
Development
Concepts
Model APIs
Develop a model
Develop a Chain
Deployment
Concepts
Deployments
Environments
Resources
Autoscaling
Inference
Concepts
Call your model
Streaming
Async inference
Output formats
Integrations
Engines
Overview
BEI
Engine-Builder-LLM
BIS-LLM
Performance concepts
Training
Training on Baseten
Get started
Concepts
Lifecycle
Management
Loading Checkpoints
Serving your trained model
Organization
Organization settings
Access control
Teams 🆕
API keys
Secrets
Restricted environments
Observability
Metrics
Status and health
Secure model inference
Export metrics
Tracing
Billing and usage
Troubleshooting
Deployments
Inference
Support
Return to Baseten
Baseten home page
Search...
⌘K
Ask AI
Support
Return to Baseten
Return to Baseten
Search...
Navigation
Quick start
Documentation
Examples
Reference
Status
Documentation
Examples
Reference
Status
Quick start
1
What modality are you working with?
Select a different modality
Large language models
Build and deploy large language models
2
Select a model or guide to get started...
Get started quickly
by deploying a model from our library in seconds.
DeepSeek R1
Qwen 2.5 32B Coder
Llama 3.3 70B Instruct
Gemma 3 27B IT
Qwen 2.5 14B Instruct
Explore model library
Or choose
a step-by-step guide to help you get started.
Fast LLMs with TensorRT-LLM
Optimize LLMs for low latency and high throughput
Run any LLM with vLLM
Serve a wide range of models
Learn concepts about developing a model
Learn about the concepts of model development
Was this page helpful?
Yes
No
⌘I