Skip to main content
Structured outputs let you generate text that conforms to specific JSON schemas, providing reliable data extraction and controlled text generation. Baseten engines like BIS-LLM and Engine-Builder-LLM support structured outputs, as do other inference frameworks like vLLM and SGLang.

Quick start

Structured outputs require two components: a Pydantic schema defining your expected output format, and an API call that enforces that schema.

Define a schema

from pydantic import BaseModel

class Task(BaseModel):
    title: str
    priority: str  # "low", "medium", "high"
    due_date: str
    description: str
Each field requires a type annotation. The model’s response will conform to these types exactly.

Generate structured output

import os
from pydantic import BaseModel
from openai import OpenAI

class Task(BaseModel):
    title: str
    priority: str
    due_date: str
    description: str

client = OpenAI(
    api_key=os.environ['BASETEN_API_KEY'],
    base_url="https://model-xxxxxx.api.baseten.co/environments/production/sync/v1"
)

response = client.beta.chat.completions.parse(
    model="not-required",
    messages=[
        {"role": "user", "content": "Create a task for: Review the quarterly report by next Friday"}
    ],
    response_format=Task
)

task = response.choices[0].message.parsed
print(f"Task: {task.title}")
print(f"Priority: {task.priority}")
Point base_url to your model’s production endpoint. Pass your Pydantic class to response_format and use beta.chat.completions.parse instead of the regular create method. The response includes a parsed attribute with your data already converted to a Task object, so no JSON parsing is needed.

Engine support

Structured outputs are compatible with:
  • Engine-Builder-LLM, except when Lookahead speculative decoding is configured.
  • BIS-LLM: except for a few exceptions like overlap scheduler enabled.

Model support

All Engine-Builder-LLM and BIS-LLM models support structured outputs out of the box with no additional configuration required.

Best practices

Schema design

  • Keep schemas simple: 2-3 levels max nesting for best results.
  • Use basic types: str, int, float, bool when possible.
  • Set defaults: Provide reasonable default values for optional fields.
  • Descriptive names: Use clear, descriptive field names.

Prompt engineering

  • Low temperature: Use 0.1-0.3 for consistent outputs.
  • Provide schema: Dump the model schema and few-shot examples into context.
  • Provide context: Give background for complex schemas.