Structured outputs

Structured outputs let you generate text that conforms to specific JSON schemas, providing reliable data extraction and controlled text generation. This feature is supported by Baseten engines like BIS-LLM and Engine-Builder-LLM, as well as other inference frameworks like vLLM and SGLang.

Quick start

Structured outputs require two components: a Pydantic schema defining your expected output format, and an API call that enforces that schema.

Define a schema

from pydantic import BaseModel

class Task(BaseModel):
    title: str
    priority: str  # "low", "medium", "high"
    due_date: str
    description: str

Each field requires a type annotation. The model’s response will conform to these types exactly.

Generate structured output

import os
from pydantic import BaseModel
from openai import OpenAI

class Task(BaseModel):
    title: str
    priority: str
    due_date: str
    description: str

client = OpenAI(
    api_key=os.environ['BASETEN_API_KEY'],
    base_url="https://model-xxxxxx.api.baseten.co/environments/production/sync/v1"
)

response = client.beta.chat.completions.parse(
    model="not-required",
    messages=[
        {"role": "user", "content": "Create a task for: Review the quarterly report by next Friday"}
    ],
    response_format=Task
)

task = response.choices[0].message.parsed
print(f"Task: {task.title}")
print(f"Priority: {task.priority}")

Point base_url to your model’s production endpoint. Pass your Pydantic class to response_format and use beta.chat.completions.parse instead of the regular create method. The response includes a parsed attribute with your data already converted to a Task object, so no JSON parsing is needed.

Engine support

Structured outputs are compatible with:

Engine-Builder-LLM, except when Lookahead speculative decoding is configured.
BIS-LLM: except for a few exceptions like overlap scheduler enabled.

Model support

All Engine-Builder-LLM and BIS-LLM models support structured outputs out of the box with no additional configuration required.

Best practices

Schema design

Keep schemas simple: 2-3 levels max nesting for best results.
Use basic types: str, int, float, bool when possible.
Set defaults: Provide reasonable default values for optional fields.
Descriptive names: Use clear, descriptive field names.

Prompt engineering

Low temperature: Use 0.1-0.3 for consistent outputs.
Provide schema: Dump the model schema and few-shot examples into context.
Provide context: Give background for complex schemas.

Engine-Builder-LLM overview: Dense model documentation.
BIS-LLM overview: MoE model documentation.
Quantization guide: FP8/FP4 trade-offs.

Get started

Concepts

Development

Deployment

Inference

Engines

Training

Organization

Observability

Troubleshooting

Structured outputs

Quick start

Define a schema

Generate structured output

Engine support

Model support

Best practices

Schema design

Prompt engineering

Get started

Concepts

Development

Deployment

Inference

Engines

Training

Organization

Observability

Troubleshooting

​Quick start

​Define a schema

​Generate structured output

​Engine support

​Model support

​Best practices

​Schema design

​Prompt engineering

​Related

Quick start

Define a schema

Generate structured output

Engine support

Model support

Best practices

Schema design

Prompt engineering

Related