Pass the schema to the LLM with the response_format argument.
Receive output that is guaranteed to match the provided schema, including types and validations like max_length.
Using structured output, you should observe approximately equivalent tokens per second output speed to an ordinary call to the model after an initial delay for schema processing. If you’re interested in the mechanisms behind structured output, check out this engineering deep dive on our blog.
Pydantic is an industry standard Python library for data validation. With Pydantic, we’ll build precise schemas for LLM output to match.For example, here’s a schema for a basic Person object.
Copy
Ask AI
from pydantic import BaseModel, Fieldfrom typing import Optionalfrom datetime import dateclass Person(BaseModel): first_name: str = Field(..., description="The person's first name", max_length=50) last_name: str = Field(..., description="The person's last name", max_length=50) age: int = Field(..., description="The person's age, must be a non-negative integer") email: str = Field(..., description="The person's email address")
Structured output supports multiple data types, required and optional fields, and additional validations like max_length.
The first time that you pass a given schema for the model, it can take a
minute for the schema to be processed and cached. Subsequent calls with the
same schema will run at normal speeds.
Once your object is defined, you can add it as a parameter to your LLM call with the response_format field:
Copy
Ask AI
import jsonimport requestspayload = { "messages": [ {"role": "system", "content": "You are a helpful assistant"}, { "role": "user", "content": "Make up a new person!"}, ], "max_tokens": 512, "response_format": { # Add this parameter to use structured outputs "type": "json_schema", "json_schema": {"schema": Person.model_json_schema()}, },}MODEL_ID = ""BASETEN_API_KEY = ""resp = requests.post( f"https://model-{MODEL_ID}.api.baseten.co/production/predict", headers={"Authorization": f"Api-Key {BASETEN_API_KEY}"}, json=payload,)json.loads(resp.text)
The response may have an end of sequence token, which will need to be removed before the JSON can be parsed.