# Baseten ## Docs - [How Baseten works](https://docs.baseten.co/concepts/howbasetenworks.md): Baseten is a platform designed to make deploying, serving, and scaling AI models seamless. - [Why Baseten](https://docs.baseten.co/concepts/whybaseten.md): Baseten delivers fast, scalable AI/ML inference with enterprise-grade security and reliability—whether in our cloud or yours. - [Autoscaling](https://docs.baseten.co/deployment/autoscaling.md): Autoscaling dynamically adjusts the number of active replicas to **handle variable traffic** while minimizing idle compute costs. - [Concepts](https://docs.baseten.co/deployment/concepts.md) - [Deployments](https://docs.baseten.co/deployment/deployments.md): Deploy, manage, and scale machine learning models with Baseten - [Environments](https://docs.baseten.co/deployment/environments.md): Manage your model’s release cycles with environments. - [Resources](https://docs.baseten.co/deployment/resources.md): Manage and configure model resources - [Binary IO](https://docs.baseten.co/development/chain/binaryio.md): Performant serialization of numeric data - [Concepts](https://docs.baseten.co/development/chain/concepts.md): Glossary of Chains concepts and terminology - [Deploy](https://docs.baseten.co/development/chain/deploy.md): Deploy your Chain on Baseten - [Architecture & Design](https://docs.baseten.co/development/chain/design.md): How to structure your Chainlets - [Engine Builder Models](https://docs.baseten.co/development/chain/engine-builder-models.md): Engine Builder models are pre-trained models that are optimized for specific inference tasks. - [Error Handling](https://docs.baseten.co/development/chain/errorhandling.md): Understanding and handling Chains errors - [Your first Chain](https://docs.baseten.co/development/chain/getting-started.md): Build and deploy two example Chains - [Invocation](https://docs.baseten.co/development/chain/invocation.md): Call your deployed Chain - [Local Development](https://docs.baseten.co/development/chain/localdev.md): Iterating, Debugging, Testing, Mocking - [Overview](https://docs.baseten.co/development/chain/overview.md) - [Streaming](https://docs.baseten.co/development/chain/streaming.md): Streaming outputs, reducing latency, SSEs - [Truss Integration](https://docs.baseten.co/development/chain/stub.md): Integrate deployed Truss models with stubs - [Subclassing](https://docs.baseten.co/development/chain/subclassing.md): Modularize and re-use Chainlet implementations - [Watch](https://docs.baseten.co/development/chain/watch.md): Live-patch deployed code - [Concepts](https://docs.baseten.co/development/concepts.md) - [Base Docker images](https://docs.baseten.co/development/model/base-images.md): A guide to configuring a base image for your truss - [Custom build commands](https://docs.baseten.co/development/model/build-commands.md): How to run your own docker commands during the build stage - [Your first model](https://docs.baseten.co/development/model/build-your-first-model.md): Build and deploy your first model - [Python driven configuration for models 🆕](https://docs.baseten.co/development/model/code-first-development.md): Use code-first development tools to streamline model production. - [Configuration](https://docs.baseten.co/development/model/configuration.md): How to configure your model. - [Custom health checks 🆕](https://docs.baseten.co/development/model/custom-health-checks.md): Customize the health of your deployments. - [Deploy Custom Servers from Docker Images](https://docs.baseten.co/development/model/custom-server.md): A config.yaml is all you need - [Data and storage](https://docs.baseten.co/development/model/data-directory.md): Load model weights without Hugging Face or S3 - [Deploy & Iterate](https://docs.baseten.co/development/model/deploy-and-iterate.md): Deploy your model and quickly iterate on it. - [Access model environments](https://docs.baseten.co/development/model/environments.md): A guide to leveraging environments in your models - [Implementation](https://docs.baseten.co/development/model/implementation.md): How to implement your model. - [Cached Weights 🆕](https://docs.baseten.co/development/model/model-cache.md): Accelerate cold starts and availability by prefetching and caching your weights. - [Developing a Model on Baseten](https://docs.baseten.co/development/model/overview.md): This page introduces the key concepts and workflow you'll use to package, configure, and iterate on models using Baseten’s developer tooling. - [Concepts](https://docs.baseten.co/development/model/performance/concepts.md): Improve your latency and throughput - [Request concurrency](https://docs.baseten.co/development/model/performance/concurrency.md): A guide to setting concurrency for your model - [Engine Builder configuration](https://docs.baseten.co/development/model/performance/engine-builder-config.md): Configure your TensorRT-LLM inference engine - [Engine control in Python](https://docs.baseten.co/development/model/performance/engine-builder-customization.md): Use `model.py` to customize engine behavior - [Engine Builder overview](https://docs.baseten.co/development/model/performance/engine-builder-overview.md): Deploy optimized model inference servers in minutes - [Private Docker Registries](https://docs.baseten.co/development/model/private-registries.md): A guide to configuring a private container registry for your truss - [Using request objects / cancellation](https://docs.baseten.co/development/model/requests.md): Get more control by directly using the request object. - [Custom Responses](https://docs.baseten.co/development/model/responses.md): Get more control by directly creating the response object. - [Security & Secrets](https://docs.baseten.co/development/model/secrets.md): Using secrets securely in your ML models - [Streaming output](https://docs.baseten.co/development/model/streaming.md): Streaming Output for LLMs - [Embeddings with BEI](https://docs.baseten.co/examples/bei.md): Serve embedding, reranking, and classification models - [Transcribe audio with Chains](https://docs.baseten.co/examples/chains-audio-transcription.md): Process hours of audio in seconds using efficient chunking, distributed inference, and optimized GPU resources. - [RAG pipeline with Chains](https://docs.baseten.co/examples/chains-build-rag.md): Build a RAG (retrieval-augmented generation) pipeline with Chains - [Deploy a ComfyUI project](https://docs.baseten.co/examples/comfyui.md): Deploy your ComfyUI workflow as an API endpoint - [Deploy your first model](https://docs.baseten.co/examples/deploy-your-first-model.md): From model weights to API endpoint - [Dockerized model](https://docs.baseten.co/examples/docker.md): Deploy any model in a pre-built Docker container - [Image generation](https://docs.baseten.co/examples/image-generation.md): Building a text-to-image model with Flux Schnell - [Deepseek R1](https://docs.baseten.co/examples/models/deepseek/deepseek-r1.md): A state-of-the-art 671B-parameter MoE LLM with o1-style reasoning licensed for commercial use - [DeepSeek-R1 Qwen 7B](https://docs.baseten.co/examples/models/deepseek/deepseek-r1-qwen-7b.md): Qwen 7B fine-tuned for CoT reasoning capabilities with DeepSeek R1 - [Flux-Schnell](https://docs.baseten.co/examples/models/flux/flux-schnell.md): Flux-Schnell is a state-of-the-art image generation model - [Gemma 3 27B IT](https://docs.baseten.co/examples/models/gemma/gemma-3-27b-it.md): Instruct-tuned open model by Google with excellent ELO/size tradeoff and vision capabilities - [Kokoro](https://docs.baseten.co/examples/models/kokoro/kokoro.md): Kokoro is a frontier TTS model for its size of 82 million parameters (text in/audio out). - [Llama 3.3 70B Instruct](https://docs.baseten.co/examples/models/llama/llama-3.3-70B-instruct.md): Llama 3.3 70B Instruct is a large language model that is optimized for instruction following. - [MARS6](https://docs.baseten.co/examples/models/mars/MARS6.md): MARS6 is a frontier text-to-speech model by CAMB.AI with voice/prosody cloning capabilities in 10 languages. MARS6 must be licensed for commercial use, we can help! - [All MPNet Base V2](https://docs.baseten.co/examples/models/microsoft/all-mpnet-base-v2.md): A text embedding model with a context window of 384 tokens and a dimensionality of 768 values. - [Nomic Embed v1.5](https://docs.baseten.co/examples/models/nomic/nomic-embed-v1-5.md): SOTA text embedding model with variable dimensionality — outperforms OpenAI text-embedding-ada-002 and text-embedding-3-small models. - [Overview](https://docs.baseten.co/examples/models/overview.md): Browse our library of open source models that are ready to deploy behind an API endpoint in seconds. - [Qwen-2-5-32B-Coder-Instruct](https://docs.baseten.co/examples/models/qwen/qwen-2-5-32b-coder-instruct.md): Qwen 2.5 32B Coder is an OpenAI-compatible model and can be called using the OpenAI SDK in any language. - [SDXL Lightning](https://docs.baseten.co/examples/models/stable-diffusion/sdxl-lightning.md): A variant of Stable Diffusion XL that generates 1024x1024 px images in 4 UNet steps, enabling near real-time image creation. - [Whisper V3](https://docs.baseten.co/examples/models/whisper/whisper-v3-fastest.md): Whisper V3 is a fast and accurate speech recognition model. - [Building with Baseten](https://docs.baseten.co/examples/overview.md) - [Deploy LLMs with SGLang](https://docs.baseten.co/examples/sglang.md): Optimized inference for LLMs with SGLang - [LLM with Streaming](https://docs.baseten.co/examples/streaming.md): Building an LLM with streaming output - [Fast LLMs with TensorRT-LLM](https://docs.baseten.co/examples/tensorrt-llm.md): Optimize LLMs for low latency and high throughput - [Text to speech](https://docs.baseten.co/examples/text-to-speech.md): Building a text-to-speech model with Kokoro - [Run any LLM with vLLM](https://docs.baseten.co/examples/vllm.md): Serve a wide range of models - [Async inference](https://docs.baseten.co/inference/async.md): Run asynchronous inference on deployed models - [Call your model](https://docs.baseten.co/inference/calling-your-model.md): Run inference on deployed models - [Concepts](https://docs.baseten.co/inference/concepts.md) - [Function calling (tool use)](https://docs.baseten.co/inference/function-calling.md): Use an LLM to select amongst provided tools - [Integrations](https://docs.baseten.co/inference/integrations.md): Integrate your models with tools like LangChain, LiteLLM, and more. - [Model I/O in binary](https://docs.baseten.co/inference/output-format/binary.md): Decode and save binary model output - [Model I/O with files](https://docs.baseten.co/inference/output-format/files.md): Call models by passing a file or URL - [Streaming](https://docs.baseten.co/inference/streaming.md): How to call a model that has a streaming-capable endpoint. - [Structured output (JSON mode)](https://docs.baseten.co/inference/structured-output.md): Enforce an output schema on LLM inference - [Workspace access control](https://docs.baseten.co/observability/access.md): Workspaces use role-based access control (RBAC) with two roles: - [Best practices for API keys](https://docs.baseten.co/observability/api-keys.md): Securely access your Baseten models - [Export to Datadog](https://docs.baseten.co/observability/export-metrics/datadog.md): Export metrics from Baseten to Datadog - [Export to Grafana Cloud](https://docs.baseten.co/observability/export-metrics/grafana.md): Export metrics from Baseten to Grafana Cloud - [Export to New Relic](https://docs.baseten.co/observability/export-metrics/new-relic.md): Export metrics from Baseten to New Relic - [Overview](https://docs.baseten.co/observability/export-metrics/overview.md): Export metrics from Baseten to your observability stack - [Export to Prometheus](https://docs.baseten.co/observability/export-metrics/prometheus.md): Export metrics from Baseten to Prometheus - [Metrics support matrix](https://docs.baseten.co/observability/export-metrics/supported-metrics.md): Which metrics can be exported - [Status and health](https://docs.baseten.co/observability/health.md): Every model deployment in your Baseten workspace has a status to represent its activity and health. - [Metrics](https://docs.baseten.co/observability/metrics.md): Understand the load and performance of your model - [Best practices for secrets](https://docs.baseten.co/observability/secrets.md): Securely store and access passwords, tokens, keys, and more - [Secure model inference](https://docs.baseten.co/observability/security.md): Keeping your models safe and private - [Tracing](https://docs.baseten.co/observability/tracing.md): Investigate the prediction flow in detail - [Billing and usage](https://docs.baseten.co/observability/usage.md): Manage payments and track overall Baseten usage - [Documentation](https://docs.baseten.co/overview.md): Baseten is a platform for deploying and serving AI models performantly, scalably, and cost-efficiently. - [Quick start](https://docs.baseten.co/quickstart.md) - [Chains CLI reference](https://docs.baseten.co/reference/cli/chains/chains-cli.md): Deploy, manage, and develop Chains using the Truss CLI. - [truss cleanup](https://docs.baseten.co/reference/cli/truss/cleanup.md): Clean up truss data. - [truss container](https://docs.baseten.co/reference/cli/truss/container.md): Subcommands for truss container. - [truss image](https://docs.baseten.co/reference/cli/truss/image.md): Subcommands for truss image. - [truss init](https://docs.baseten.co/reference/cli/truss/init.md): Create a new Truss. - [truss login](https://docs.baseten.co/reference/cli/truss/login.md): Authenticate with Baseten. - [Overview](https://docs.baseten.co/reference/cli/truss/overview.md): Details on Truss CLI and configuration options - [truss predict](https://docs.baseten.co/reference/cli/truss/predict.md): Invokes the packaged model. - [truss push](https://docs.baseten.co/reference/cli/truss/push.md): Pushes a truss to a TrussRemote. - [truss run-python](https://docs.baseten.co/reference/cli/truss/run-python.md): Subcommands for truss run-python. - [truss watch](https://docs.baseten.co/reference/cli/truss/watch.md): Seamless remote development with truss. - [Overview](https://docs.baseten.co/reference/inference-api/overview.md): The inference API is used to call deployed models and chains. - [Async cancel request](https://docs.baseten.co/reference/inference-api/predict-endpoints/cancel-async-request.md): Use this endpoint to cancel a queued async request. - [Async deployment](https://docs.baseten.co/reference/inference-api/predict-endpoints/deployment-async-predict.md): Use this endpoint to call any [published deployment](/deploy/lifecycle) of your model. - [Async chains deployment](https://docs.baseten.co/reference/inference-api/predict-endpoints/deployment-async-run-remote.md) - [Deployment](https://docs.baseten.co/reference/inference-api/predict-endpoints/deployment-predict.md) - [Chains deployment](https://docs.baseten.co/reference/inference-api/predict-endpoints/deployment-run-remote.md) - [Async development](https://docs.baseten.co/reference/inference-api/predict-endpoints/development-async-predict.md): Use this endpoint to call the [development deployment](/deploy/lifecycle) of your model asynchronously. - [Async chains development](https://docs.baseten.co/reference/inference-api/predict-endpoints/development-async-run-remote.md) - [Development](https://docs.baseten.co/reference/inference-api/predict-endpoints/development-predict.md) - [Chains development](https://docs.baseten.co/reference/inference-api/predict-endpoints/development-run-remote.md) - [Async environment](https://docs.baseten.co/reference/inference-api/predict-endpoints/environments-async-predict.md): Use this endpoint to call the model associated with the specified environment asynchronously. - [Async chains environment](https://docs.baseten.co/reference/inference-api/predict-endpoints/environments-async-run-remote.md): Use this endpoint to call the deployment associated with the specified environment asynchronously. - [Environment](https://docs.baseten.co/reference/inference-api/predict-endpoints/environments-predict.md) - [Chains environment](https://docs.baseten.co/reference/inference-api/predict-endpoints/environments-run-remote.md): Use this endpoint to call the deployment associated with the specified environment. - [Async deployment](https://docs.baseten.co/reference/inference-api/status-endpoints/deployment-get-async-queue-status.md): Use this endpoint to get the status of a published deployment's async queue. - [Async development](https://docs.baseten.co/reference/inference-api/status-endpoints/development-get-async-queue-status.md): Use this endpoint to get the status of a development deployment's async queue. - [Async environment](https://docs.baseten.co/reference/inference-api/status-endpoints/environments-get-async-queue-status.md): Use this endpoint to get the async queue status for a model associated with the specified environment. - [Async request](https://docs.baseten.co/reference/inference-api/status-endpoints/get-async-request-status.md): Use this endpoint to get the status of an async request. - [Deployment](https://docs.baseten.co/reference/inference-api/wake/deployment-wake.md) - [Development](https://docs.baseten.co/reference/inference-api/wake/development-wake.md) - [Production](https://docs.baseten.co/reference/inference-api/wake/production-wake.md) - [Delete chains](https://docs.baseten.co/reference/management-api/chains/deletes-a-chain-by-id.md) - [By ID](https://docs.baseten.co/reference/management-api/chains/gets-a-chain-by-id.md) - [All chains](https://docs.baseten.co/reference/management-api/chains/gets-all-chains.md) - [Any deployment by ID](https://docs.baseten.co/reference/management-api/deployments/activate/activates-a-deployment.md): Activates an inactive deployment and returns the activation status. - [Activate environment deployment](https://docs.baseten.co/reference/management-api/deployments/activate/activates-a-deployment-associated-with-an-environment.md): Activates an inactive deployment associated with an environment and returns the activation status. - [Development deployment](https://docs.baseten.co/reference/management-api/deployments/activate/activates-a-development-deployment.md): Activates an inactive development deployment and returns the activation status. - [Update chainlet environment's autoscaling settings](https://docs.baseten.co/reference/management-api/deployments/autoscaling/update-a-chainlet-environments-autoscaling-settings.md): Updates a chainlet environment's autoscaling settings and returns the updated chainlet environment settings. - [Any model deployment by ID](https://docs.baseten.co/reference/management-api/deployments/autoscaling/updates-a-deployments-autoscaling-settings.md): Updates a deployment's autoscaling settings and returns the update status. - [Development model deployment](https://docs.baseten.co/reference/management-api/deployments/autoscaling/updates-a-development-deployments-autoscaling-settings.md): Updates a development deployment's autoscaling settings and returns the update status. - [Production model deployment](https://docs.baseten.co/reference/management-api/deployments/autoscaling/updates-a-production-deployments-autoscaling-settings.md): Updates a production deployment's autoscaling settings and returns the update status. - [Any deployment by ID](https://docs.baseten.co/reference/management-api/deployments/deactivate/deactivates-a-deployment.md): Deactivates a deployment and returns the deactivation status. - [Deactivate environment deployment](https://docs.baseten.co/reference/management-api/deployments/deactivate/deactivates-a-deployment-associated-with-an-environment.md): Deactivates a deployment associated with an environment and returns the deactivation status. - [Development deployment](https://docs.baseten.co/reference/management-api/deployments/deactivate/deactivates-a-development-deployment.md): Deactivates a development deployment and returns the deactivation status. - [Delete chain deployment](https://docs.baseten.co/reference/management-api/deployments/deletes-a-chain-deployment-by-id.md) - [Delete model deployments](https://docs.baseten.co/reference/management-api/deployments/deletes-a-models-deployment-by-id.md): Deletes a model's deployment by ID and returns the tombstone of the deployment. - [Any chain deployment by ID](https://docs.baseten.co/reference/management-api/deployments/gets-a-chain-deployment-by-id.md) - [Any model deployment by ID](https://docs.baseten.co/reference/management-api/deployments/gets-a-models-deployment-by-id.md): Gets a model's deployment by ID and returns the deployment. - [Development model deployment](https://docs.baseten.co/reference/management-api/deployments/gets-a-models-development-deployment.md): Gets a model's development deployment and returns the deployment. - [Production model deployment](https://docs.baseten.co/reference/management-api/deployments/gets-a-models-production-deployment.md): Gets a model's production deployment and returns the deployment. - [Get all chain deployments](https://docs.baseten.co/reference/management-api/deployments/gets-all-chain-deployments.md) - [Get all model deployments](https://docs.baseten.co/reference/management-api/deployments/gets-all-deployments-of-a-model.md) - [Promote to chain environment](https://docs.baseten.co/reference/management-api/deployments/promote/promotes-a-chain-deployment-to-an-environment.md): Promotes an existing chain deployment to an environment and returns the promoted chain deployment. - [Promote to model environment](https://docs.baseten.co/reference/management-api/deployments/promote/promotes-a-deployment-to-an-environment.md): Promotes an existing deployment to an environment and returns the promoted deployment. - [Any model deployment by ID](https://docs.baseten.co/reference/management-api/deployments/promote/promotes-a-deployment-to-production.md): Promotes an existing deployment to production and returns the same deployment. - [Development model deployment](https://docs.baseten.co/reference/management-api/deployments/promote/promotes-a-development-deployment-to-production.md): Creates a new production deployment from the development deployment, the currently building deployment is returned. - [Create Chain environment](https://docs.baseten.co/reference/management-api/environments/create-a-chain-environment.md): Create a chain environment. Returns the resulting environment. - [Create environment](https://docs.baseten.co/reference/management-api/environments/create-an-environment.md): Creates an environment for the specified model and returns the environment. - [Get Chain environment](https://docs.baseten.co/reference/management-api/environments/get-a-chain-environments-details.md): Gets a chain environment's details and returns the chain environment. - [Get all Chain environments](https://docs.baseten.co/reference/management-api/environments/get-all-chain-environments.md): Gets all chain environments for a given chain - [Get all environments](https://docs.baseten.co/reference/management-api/environments/get-all-environments.md): Gets all environments for a given model - [Get environment](https://docs.baseten.co/reference/management-api/environments/get-an-environments-details.md): Gets an environment's details and returns the environment. - [Update Chain environment](https://docs.baseten.co/reference/management-api/environments/update-a-chain-environments-settings.md): Update a chain environment's settings and returns the chain environment. - [Update chainlet environment's instance type](https://docs.baseten.co/reference/management-api/environments/update-a-chainlet-environments-instance-type-settings.md): Updates a chainlet environment's instance type settings. The chainlet environment setting must exist. When updated, a new chain deployment is created and deployed. It is promoted to the chain environment according to promotion settings on the environment. - [Update model environment](https://docs.baseten.co/reference/management-api/environments/update-an-environments-settings.md): Updates an environment's settings and returns the updated environment. - [Delete models](https://docs.baseten.co/reference/management-api/models/deletes-a-model-by-id.md) - [By ID](https://docs.baseten.co/reference/management-api/models/gets-a-model-by-id.md) - [All models](https://docs.baseten.co/reference/management-api/models/gets-all-models.md) - [Overview](https://docs.baseten.co/reference/management-api/overview.md): The management API is used to manage models and deployments. It supports monitoring, CI/CD, and automation at both the model and workspace levels. - [Get all secrets](https://docs.baseten.co/reference/management-api/secrets/gets-all-secrets.md) - [Upsert a secret](https://docs.baseten.co/reference/management-api/secrets/upserts-a-secret.md): Creates a new secret or updates an existing secret if one with the provided name already exists. The name and creation date of the created or updated secret is returned. - [Reference documentation](https://docs.baseten.co/reference/overview.md): For deploying, managing, and interacting with machine learning models on Baseten. - [Chains SDK Reference](https://docs.baseten.co/reference/sdk/chains.md): Python SDK Reference for Chains - [Truss SDK Reference](https://docs.baseten.co/reference/sdk/truss.md): Python SDK for deploying and managing models with Truss. - [Configure Truss](https://docs.baseten.co/reference/truss-configuration.md): Set your model resources, dependencies, and more - [Baseten platform status](https://docs.baseten.co/status/status.md): Current operational status of Baseten's services. - [Deployments](https://docs.baseten.co/troubleshooting/deployments.md): Troubleshoot common problems during model deployment - [Inference](https://docs.baseten.co/troubleshooting/inference.md): Troubleshoot common problems during model inference