Set up your API key and SDK
Generate a personal API key from your Baseten account and install a client SDK to call models.Export your API key
Install a client SDK
Run inference
Every Model API is compatible with the OpenAI SDK, with Anthropic SDK support in beta. Most also support tool calling, structured outputs, and more. Call a model using the OpenAI SDK. This example useszai-org/GLM-5, but you can swap in any supported model.
- Python
- JavaScript
- cURL
Create a chat completion:
chat.py
Stream the response
Streaming returns the response token by token as the model generates it, instead of waiting for the full reply. The first tokens appear immediately, which makes chat UIs and other interactive applications feel responsive.- Python
- JavaScript
Set
stream=True to receive tokens as they’re generated:stream.py
Explore Model API features
Structured outputs
Generate JSON that conforms to a schema you define.
Tool calling
Let the model invoke functions and use the results in its response.
Reasoning
Enable extended thinking for multi-step problem solving.
Next steps
Platform overview
Deploy models, run multi-step pipelines, train and fine-tune. See everything Baseten offers.
Deploy your first model
Go beyond Model APIs with a config-only Truss deployment on dedicated GPUs.