Structured outputs

Structured outputs let you generate text that conforms to specific JSON schemas, providing reliable data extraction and controlled text generation. Baseten engines like BIS-LLM and Engine-Builder-LLM support structured outputs, as do other inference frameworks like vLLM and SGLang.

Quick start

Structured outputs require two components: a Pydantic schema defining your expected output format, and an API call that enforces that schema.

Define a schema

from pydantic import BaseModel

class Task(BaseModel):
    title: str
    priority: str  # "low", "medium", "high"
    due_date: str
    description: str

Each field requires a type annotation. The model’s response will conform to these types exactly.

Generate structured output

import os
from pydantic import BaseModel
from openai import OpenAI

class Task(BaseModel):
    title: str
    priority: str
    due_date: str
    description: str

client = OpenAI(
    api_key=os.environ['BASETEN_API_KEY'],
    base_url="https://model-xxxxxx.api.baseten.co/environments/production/sync/v1"
)

response = client.beta.chat.completions.parse(
    model="not-required",
    messages=[
        {"role": "user", "content": "Create a task for: Review the quarterly report by next Friday"}
    ],
    response_format=Task
)

task = response.choices[0].message.parsed
print(f"Task: {task.title}")
print(f"Priority: {task.priority}")

Point base_url to your model’s production endpoint. Pass your Pydantic class to response_format and use beta.chat.completions.parse instead of the regular create method. The response includes a parsed attribute with your data already converted to a Task object, so no JSON parsing is needed.

Engine support

Structured outputs are compatible with:

Engine-Builder-LLM, except when Lookahead speculative decoding is configured.
BIS-LLM: except for a few exceptions like overlap scheduler enabled.

Model support

All Engine-Builder-LLM and BIS-LLM models support structured outputs out of the box with no additional configuration required.

Best practices

Schema design

Keep schemas simple: 2-3 levels max nesting for best results.
Use basic types: str, int, float, bool when possible.
Set defaults: Provide reasonable default values for optional fields.
Descriptive names: Use clear, descriptive field names.

Prompt engineering

Low temperature: Use 0.1-0.3 for consistent outputs.
Provide schema: Dump the model schema and few-shot examples into context.
Provide context: Give background for complex schemas.

Engine-Builder-LLM overview: Dense model documentation.
BIS-LLM overview: MoE model documentation.
Quantization guide: FP8/FP4 trade-offs.

Get started

Concepts

Development

Deployment

Inference

Engines

Training

Organization

Observability

Troubleshooting

Structured outputs

Quick start

Define a schema

Generate structured output

Engine support

Model support

Best practices

Schema design

Prompt engineering

Get started

Concepts

Development

Deployment

Inference

Engines

Training

Organization

Observability

Troubleshooting

​Quick start

​Define a schema

​Generate structured output

​Engine support

​Model support

​Best practices

​Schema design

​Prompt engineering

​Related

Quick start

Define a schema

Generate structured output

Engine support

Model support

Best practices

Schema design

Prompt engineering

Related