Skip to main content
Structured outputs let you generate text that conforms to specific JSON schemas, providing reliable data extraction and controlled text generation. This feature is supported by Baseten engines like BIS-LLM and Engine-Builder-LLM, as well as other inference frameworks like vLLM and SGLang.

Quick start

Structured outputs require two components: a Pydantic schema defining your expected output format, and an API call that enforces that schema.

Define a schema

from pydantic import BaseModel

class Task(BaseModel):
    title: str
    priority: str  # "low", "medium", "high"
    due_date: str
    description: str
Each field requires a type annotation. The model’s response will conform to these types exactly.

Generate structured output

import os
from pydantic import BaseModel
from openai import OpenAI

class Task(BaseModel):
    title: str
    priority: str
    due_date: str
    description: str

client = OpenAI(
    api_key=os.environ['BASETEN_API_KEY'],
    base_url="https://model-xxxxxx.api.baseten.co/environments/production/sync/v1"
)

response = client.beta.chat.completions.parse(
    model="not-required",
    messages=[
        {"role": "user", "content": "Create a task for: Review the quarterly report by next Friday"}
    ],
    response_format=Task
)

task = response.choices[0].message.parsed
print(f"Task: {task.title}")
print(f"Priority: {task.priority}")
Point base_url to your model’s production endpoint. Pass your Pydantic class to response_format and use beta.chat.completions.parse instead of the regular create method. The response includes a parsed attribute with your data already converted to a Task object, so no JSON parsing is needed.

Engine support

Structured outputs are compatible with:
  • Engine-Builder-LLM, except when Lookahead speculative decoding is configured.
  • BIS-LLM: except for a few exceptions like overlap scheduler enabled.

Model support

All Engine-Builder-LLM and BIS-LLM models support structured outputs out of the box with no additional configuration required.

Best practices

Schema design

  • Keep schemas simple: 2-3 levels max nesting for best results.
  • Use basic types: str, int, float, bool when possible.
  • Set defaults: Provide reasonable default values for optional fields.
  • Descriptive names: Use clear, descriptive field names.

Prompt engineering

  • Low temperature: Use 0.1-0.3 for consistent outputs.
  • Provide schema: Dump the model schema and few-shot examples into context.
  • Provide context: Give background for complex schemas.

Further reading