Skip to main content
These examples cover a variety of use cases on Baseten, from deploying your first LLM and image generation to transcription, embeddings, and RAG pipelines. Whether you’re optimizing inference with TensorRT-LLM or deploying a model with Truss, these guides help you build and scale efficiently.

Choosing the right engine

Not sure which engine to use? Check out our engine documentation to:
  • Select the appropriate engine for your model architecture (embeddings, dense LLMs, or MoE models)
  • Understand performance trade-offs between different engine options
  • Configure advanced features like quantization and speculative decoding
  • Optimize for your specific use case with engine-specific guidance

Deploy your first model

Fast LLMs with TensorRT-LLM

Run any LLM with vLLM

Deploy LLMs with SGLang

Transcribe audio with a Chain

Embeddings with BEI

Training

Train and fine-tune models with Baseten’s scalable training infrastructure. From fine-tuning large language models to training custom models, our platform provides the tools and compute you need.

GPT OSS 20B with LoRA

Qwen3 8B LoRA DPO

Long Context Qwen3-30B

Coding with Qwen3-8B

Our training infrastructure supports popular frameworks including VERL, Megatron, and Unsloth, as well as models trained directly with Hugging Face Transformers.