# Baseten ## Docs - [Any deployment by ID](https://docs.baseten.co/api-reference/activates-a-deployment): Activates an inactive deployment and returns the activation status. - [🆕 Activate environment deployment](https://docs.baseten.co/api-reference/activates-a-deployment-associated-with-an-environment): Activates an inactive deployment associated with an environment and returns the activation status. - [Development deployment](https://docs.baseten.co/api-reference/activates-a-development-deployment): Activates an inactive development deployment and returns the activation status. - [Production deployment](https://docs.baseten.co/api-reference/activates-a-production-deployment): Activates an inactive production deployment and returns the activation status. - [Cancel async request](https://docs.baseten.co/api-reference/cancel-async-request): Use this endpoint to cancel a queued async request. - [Create a model environment](https://docs.baseten.co/api-reference/create-an-environment): Creates an environment for the specified model and returns the environment. - [Any deployment by ID](https://docs.baseten.co/api-reference/deactivates-a-deployment): Deactivates a deployment and returns the deactivation status. - [🆕 Deactivate environment deployment](https://docs.baseten.co/api-reference/deactivates-a-deployment-associated-with-an-environment): Deactivates a deployment associated with an environment and returns the deactivation status. - [Development deployment](https://docs.baseten.co/api-reference/deactivates-a-development-deployment): Deactivates a development deployment and returns the deactivation status. - [Production deployment](https://docs.baseten.co/api-reference/deactivates-a-production-deployment): Deactivates a production deployment and returns the deactivation status. - [Published deployment](https://docs.baseten.co/api-reference/deployment-async-predict): Use this endpoint to call any [published deployment](/deploy/lifecycle) of your model. - [Published deployment](https://docs.baseten.co/api-reference/deployment-get-async-queue-status): Use this endpoint to get the status of a published deployment's async queue. - [Published deployment](https://docs.baseten.co/api-reference/deployment-predict) - [Published deployment](https://docs.baseten.co/api-reference/deployment-run-remote) - [Published deployment](https://docs.baseten.co/api-reference/deployment-wake) - [Development deployment](https://docs.baseten.co/api-reference/development-async-predict): Use this endpoint to call the [development deployment](/deploy/lifecycle) of your model asynchronously. - [Development deployment](https://docs.baseten.co/api-reference/development-get-async-queue-status): Use this endpoint to get the status of a development deployment's async queue. - [Development deployment](https://docs.baseten.co/api-reference/development-predict) - [Development deployment](https://docs.baseten.co/api-reference/development-run-remote) - [Development deployment](https://docs.baseten.co/api-reference/development-wake) - [🆕 Async inference by environment](https://docs.baseten.co/api-reference/environments-async-predict): Use this endpoint to call the model associated with the specified environment asynchronously. - [Environment deployment](https://docs.baseten.co/api-reference/environments-get-async-queue-status): Use this endpoint to get the async queue status for a model associated with the specified environment. - [🆕 Inference by environment](https://docs.baseten.co/api-reference/environments-predict) - [🆕 Inference by environment](https://docs.baseten.co/api-reference/environments-run-remote) - [Get chain environment](https://docs.baseten.co/api-reference/get-a-chain-environments-details): Gets a chain environment's details and returns the chain environment. - [Get all chain environments](https://docs.baseten.co/api-reference/get-all-chain-environments): Gets all chain environments for a given chain - [Get all model environments](https://docs.baseten.co/api-reference/get-all-environments): Gets all environments for a given model - [Get model environment](https://docs.baseten.co/api-reference/get-an-environments-details): Gets an environment's details and returns the environment. - [Get async request status](https://docs.baseten.co/api-reference/get-async-request-status): Use this endpoint to get the status of an async request. - [Get a chain by ID](https://docs.baseten.co/api-reference/gets-a-chain-by-id) - [Any chain deployment by ID](https://docs.baseten.co/api-reference/gets-a-chain-deployment-by-id) - [Get a model by ID](https://docs.baseten.co/api-reference/gets-a-model-by-id) - [Any model deployment by ID](https://docs.baseten.co/api-reference/gets-a-models-deployment-by-id): Gets a model's deployment by id and returns the deployment. - [Development model deployment](https://docs.baseten.co/api-reference/gets-a-models-development-deployment): Gets a model's development deployment and returns the deployment. - [Production model deployment](https://docs.baseten.co/api-reference/gets-a-models-production-deployment): Gets a model's production deployment and returns the deployment. - [Get all chain deployments](https://docs.baseten.co/api-reference/gets-all-chain-deployments) - [Get all chains](https://docs.baseten.co/api-reference/gets-all-chains) - [Get all model deployments](https://docs.baseten.co/api-reference/gets-all-deployments-of-a-model) - [Get all models](https://docs.baseten.co/api-reference/gets-all-models) - [Get all secrets](https://docs.baseten.co/api-reference/gets-all-secrets) - [Model endpoint migration guide](https://docs.baseten.co/api-reference/migration-guide): No more JSON wrapper with model output - [Call primary version](https://docs.baseten.co/api-reference/model-predict) - [Wake primary version](https://docs.baseten.co/api-reference/model-wake) - [ChatCompletions](https://docs.baseten.co/api-reference/openai) - [ChatCompletions (deprecated)](https://docs.baseten.co/api-reference/openai-deprecated) - [API reference](https://docs.baseten.co/api-reference/overview): Details on model inference and management APIs - [Production deployment](https://docs.baseten.co/api-reference/production-async-predict): Use this endpoint to call the [production deployment](/deploy/lifecycle) of your model asynchronously. - [Production deployment](https://docs.baseten.co/api-reference/production-get-async-queue-status): Use this endpoint to get the status of a production deployment's async queue. - [Production deployment](https://docs.baseten.co/api-reference/production-predict) - [Production deployment](https://docs.baseten.co/api-reference/production-run-remote) - [Production deployment](https://docs.baseten.co/api-reference/production-wake) - [🆕 Promote to chain environment](https://docs.baseten.co/api-reference/promotes-a-chain-deployment-to-an-environment): Promotes an existing chain deployment to an environment and returns the promoted chain deployment. - [🆕 Promote to model environment](https://docs.baseten.co/api-reference/promotes-a-deployment-to-an-environment): Promotes an existing deployment to an environment and returns the promoted deployment. - [Any model deployment by ID](https://docs.baseten.co/api-reference/promotes-a-deployment-to-production): Promotes an existing deployment to production and returns the same deployment. - [Development model deployment](https://docs.baseten.co/api-reference/promotes-a-development-deployment-to-production): Creates a new production deployment from the development deployment, the currently building deployment is returned. - [Update model environment](https://docs.baseten.co/api-reference/update-an-environments-settings): Updates an environment's settings and returns the updated environment. - [Any model deployment by ID](https://docs.baseten.co/api-reference/updates-a-deployments-autoscaling-settings): Updates a deployment's autoscaling settings and returns the update status. - [Development model deployment](https://docs.baseten.co/api-reference/updates-a-development-deployments-autoscaling-settings): Updates a development deployment's autoscaling settings and returns the update status. - [Production model deployment](https://docs.baseten.co/api-reference/updates-a-production-deployments-autoscaling-settings): Updates a production deployment's autoscaling settings and returns the update status. - [Upsert a secret](https://docs.baseten.co/api-reference/upserts-a-secret): Creates a new secret or updates an existing secret if one with the provided name already exists. The name and creation date of the created or updated secret is returned. - [Call model version](https://docs.baseten.co/api-reference/version-predict) - [Wake model version](https://docs.baseten.co/api-reference/version-wake) - [Chains CLI reference](https://docs.baseten.co/chains-reference/cli): Details on Chains CLI - [Chains reference](https://docs.baseten.co/chains-reference/overview): Details on Chains CLI and configuration options - [Chains SDK Reference](https://docs.baseten.co/chains-reference/sdk): Python SDK Reference for Chains - [Concepts](https://docs.baseten.co/chains/concepts): Glossary of Chains concepts and terminology - [Audio Transcription Chain](https://docs.baseten.co/chains/examples/audio-transcription): Transcribe hours of audio to text in a few seconds - [RAG Chain](https://docs.baseten.co/chains/examples/build-rag): Build a RAG (retrieval-augmented generation) pipeline with Chains - [Build your first Chain](https://docs.baseten.co/chains/getting-started): Build and deploy two example Chains - [User Guides](https://docs.baseten.co/chains/guide): Using the full potential of Chains - [Overview](https://docs.baseten.co/chains/overview): Chains: A new DX for deploying multi-component ML workflows - [Autoscaling](https://docs.baseten.co/deploy/autoscaling): Scale from internal testing to the top of Hacker News - [Deployments and environments](https://docs.baseten.co/deploy/lifecycle): Deployment lifecycle on Baseten - [Setting GPU resources](https://docs.baseten.co/deploy/resources): Serve your model on the right instance type - [Troubleshooting](https://docs.baseten.co/deploy/troubleshooting): Fixing common problems during model deployment - [Async inference user guide](https://docs.baseten.co/invoke/async): Run asynchronous inference on deployed models - [Securing async inference](https://docs.baseten.co/invoke/async-secure): Secure the asynchronous inference results sent to your webhook - [How to parse base64 output](https://docs.baseten.co/invoke/base64): Decode and save model output - [How to do model I/O in binary](https://docs.baseten.co/invoke/binary): Decode and save binary model output - [How to do model I/O with files](https://docs.baseten.co/invoke/files): Call models by passing a file or URL - [Function calling (tool use)](https://docs.baseten.co/invoke/function-calling): Use an LLM to select amongst provided tools - [Baseten model integrations](https://docs.baseten.co/invoke/integrations): Use your Baseten models with tools like LangChain - [How to call your model](https://docs.baseten.co/invoke/quickstart): Run inference on deployed models - [How to stream model output](https://docs.baseten.co/invoke/streaming): Reduce time to first token for LLMs - [Structured output (JSON mode)](https://docs.baseten.co/invoke/structured-output): Enforce an output schema on LLM inference - [Troubleshooting](https://docs.baseten.co/invoke/troubleshooting): Fixing common problems during model inference - [Workspace access control](https://docs.baseten.co/observability/access): Share your Baseten workspace with your team - [Best practices for API keys](https://docs.baseten.co/observability/api-keys): Securely access your Baseten models - [Export metrics to Datadog](https://docs.baseten.co/observability/export-metrics/datadog): Export metrics from Baseten to Datadog - [Export metrics to Grafana Cloud](https://docs.baseten.co/observability/export-metrics/grafana): Export metrics from Baseten to Grafana Cloud - [Export metrics to New Relic](https://docs.baseten.co/observability/export-metrics/new-relic): Export metrics from Baseten to New Relic - [Metrics export overview](https://docs.baseten.co/observability/export-metrics/overview): Export metrics from Baseten to your observability stack - [Export metrics to Prometheus](https://docs.baseten.co/observability/export-metrics/prometheus): Export metrics from Baseten to Prometheus - [Metrics support matrix](https://docs.baseten.co/observability/export-metrics/supported-metrics): Which metrics can be exported - [Monitoring model health](https://docs.baseten.co/observability/health): Diagnose and fix model server issues - [Reading model metrics](https://docs.baseten.co/observability/metrics): Understand the load and performance of your model - [Best practices for secrets](https://docs.baseten.co/observability/secrets): Securely store and access passwords, tokens, keys, and more - [Secure model inference](https://docs.baseten.co/observability/security): Keeping your models safe and private - [Tracing](https://docs.baseten.co/observability/tracing): Investigate the prediction flow in detail - [Billing and usage](https://docs.baseten.co/observability/usage): Manage payments and track overall Baseten usage - [How to get faster cold starts](https://docs.baseten.co/performance/cold-starts): Engineering your Truss and application for faster cold starts - [Setting concurrency](https://docs.baseten.co/performance/concurrency): Handle variable throughput with this autoscaling parameter - [Engine Builder configuration](https://docs.baseten.co/performance/engine-builder-config): Configure your TensorRT-LLM inference engine - [Engine control in Python](https://docs.baseten.co/performance/engine-builder-customization): Use `model.py` to customize engine behavior - [Engine Builder overview](https://docs.baseten.co/performance/engine-builder-overview): Deploy optimized model inference servers in minutes - [Build your first LLM engine](https://docs.baseten.co/performance/engine-builder-tutorial): Automatically build and deploy a TensorRT-LLM model serving engine - [Llama 3 with TensorRT-LLM](https://docs.baseten.co/performance/examples/llama-trt): Build an optimized inference engine for Llama 3.1 8B - [Mistral with TensorRT-LLM](https://docs.baseten.co/performance/examples/mistral-trt): Build an optimized inference engine for Mistral - [Qwen with TensorRT-LLM](https://docs.baseten.co/performance/examples/qwen-trt): Build an optimized inference engine for Qwen - [Instance type reference](https://docs.baseten.co/performance/instances): Specs and recommendations for every instance type on Baseten - [Model performance overview](https://docs.baseten.co/performance/overview): Improve your latency and throughput - [Deploy your first model](https://docs.baseten.co/quickstart): From model weights to API endpoint - [truss](https://docs.baseten.co/truss-reference/cli): The simplest way to serve models in production - [truss cleanup](https://docs.baseten.co/truss-reference/cli/cleanup): Clean up truss data. - [truss container](https://docs.baseten.co/truss-reference/cli/container): Subcommands for truss container. - [truss image](https://docs.baseten.co/truss-reference/cli/image): Subcommands for truss image. - [truss init](https://docs.baseten.co/truss-reference/cli/init): Create a new Truss. - [truss login](https://docs.baseten.co/truss-reference/cli/login): Authenticate with Baseten. - [truss predict](https://docs.baseten.co/truss-reference/cli/predict): Invokes the packaged model. - [truss push](https://docs.baseten.co/truss-reference/cli/push): Pushes a truss to a TrussRemote. - [truss run-python](https://docs.baseten.co/truss-reference/cli/run-python): Subcommands for truss run-python. - [truss watch](https://docs.baseten.co/truss-reference/cli/watch): Seamless remote development with truss. - [Config options](https://docs.baseten.co/truss-reference/config): Set your model resources, dependencies, and more - [Truss reference](https://docs.baseten.co/truss-reference/overview): Details on Truss CLI and configuration options - [Truss Python SDK Reference](https://docs.baseten.co/truss-reference/python-sdk): Python SDK Reference for Truss - [Getting Started](https://docs.baseten.co/truss/examples/01-getting-started-bert): Building your first Truss - [LLM](https://docs.baseten.co/truss/examples/02-llm): Building an LLM - [LLM with Streaming](https://docs.baseten.co/truss/examples/03-llm-with-streaming): Building an LLM with streaming output - [Text-to-image](https://docs.baseten.co/truss/examples/04-image-generation): Building a text-to-image model with SDXL - [Fast Cold Starts with Cached Weights](https://docs.baseten.co/truss/examples/06-high-performance-cached-weights): Deploy a language model, with the model weights cached at build time - [Private Hugging Face Model](https://docs.baseten.co/truss/examples/09-private-huggingface): Load a model that requires authentication with Hugging Face - [Model with system packages](https://docs.baseten.co/truss/examples/10-using-system-packages): Deploy a model with both Python and system dependencies - [Base Docker images](https://docs.baseten.co/truss/guides/base-images): A guide to configuring a base image for your truss - [Running custom docker commands](https://docs.baseten.co/truss/guides/build-commands): How to run your own docker commands during the build stage - [Deploy Llama 2 with Caching](https://docs.baseten.co/truss/guides/cached-weights): Enable fast cold starts for a model with private Hugging Face weights - [Request concurrency](https://docs.baseten.co/truss/guides/concurrency): A guide to setting concurrency for your model - [Deploy Custom Server from Docker image](https://docs.baseten.co/truss/guides/custom-server): A config.yaml is all you need - [Model weights](https://docs.baseten.co/truss/guides/data-directory): Load model weights without Hugging Face or S3 - [Access model environments](https://docs.baseten.co/truss/guides/environments): A guide to leveraging environments in your models - [External (source) packages](https://docs.baseten.co/truss/guides/external-packages): A guide on configuring your truss to use external packages - [Caching model weights](https://docs.baseten.co/truss/guides/model-cache): Accelerate cold starts by caching your weights - [Pre/post-processing](https://docs.baseten.co/truss/guides/pre-process): Deploy a model that makes use of pre-process - [Private Hugging Face model](https://docs.baseten.co/truss/guides/private-model): Load a model that requires authentication with Hugging Face - [Using request objects / Cancellation](https://docs.baseten.co/truss/guides/requests): Get more control by directly using the request object. - [Returning response objects and SSEs](https://docs.baseten.co/truss/guides/responses): Get more control by directly creating the response object. - [Storing secrets in Baseten](https://docs.baseten.co/truss/guides/secrets): A guide to using secrets securely in your ML models - [Streaming output with an LLM](https://docs.baseten.co/truss/guides/streaming): Deploy an LLM and stream the output - [Model with system packages](https://docs.baseten.co/truss/guides/system-packages): Deploy a model with both Python and system dependencies - [Overview](https://docs.baseten.co/truss/overview): Truss: Package and deploy AI models on Baseten - [Welcome to Baseten!](https://docs.baseten.co/welcome): Fast, scalable inference in our cloud or yours ## Optional - [Changelog](https://www.baseten.co/changelog/) - [Model library](https://www.baseten.co/library) - [Truss examples](https://github.com/basetenlabs/truss-examples)