Prerequisites
- A Baseten account with an API key
- Python 3.9+ (check with
python3 --version)
- uv (recommended)
- pip (macOS/Linux)
- pip (Windows)
Run inference
Call a model using the OpenAI SDK. This example uses GLM-4.7, but you can substitute any model from the supported models list.- Python
- JavaScript
- cURL
Install the OpenAI SDK if you don’t have it:Create a chat completion:
chat.py
Stream the response
For real-time applications, setstream: true to receive tokens as they’re generated:
- Python
- JavaScript
stream.py
Explore Model API features
Model APIs support the full OpenAI Chat Completions API. Constrain outputs to a JSON schema, let the model call functions you define, or enable extended thinking for complex tasks. See the Model APIs documentation for the full parameter reference and supported models.Structured outputs
Generate JSON that conforms to a schema you define.
Tool calling
Let the model invoke functions and use the results in its response.
Reasoning
Enable extended thinking for multi-step problem solving.
Next steps
Platform overview
Deploy models, run multi-step pipelines, train and fine-tune — see everything Baseten offers.
Deploy your first model
Go beyond Model APIs with a config-only Truss deployment on dedicated GPUs.