https://inference.baseten.co/v1 with your Baseten API key and the OpenAI SDK pointed at that base URL. The Model APIs documentation lists models, pricing, and feature support. For what happens after the gateway (routing, replicas, queuing, retries, cold starts), see Request lifecycle.
Inference API
When you deploy your own model, pick an interface that matches your payloads. Engine-Builder-LLM, BIS-LLM, and BEI expose/v1/chat/completions (or /v1/embeddings for BEI) on https://inference.baseten.co/v1 with OpenAI-compatible parameters for structured outputs, tool calling, reasoning, and streaming. Custom Truss code can use /predict for arbitrary JSON when chat or embeddings are not a good fit. Use the Inference API reference for paths, methods, and errors.