Quick start
Structured outputs require two components: a Pydantic schema defining your expected output format, and an API call that enforces that schema.Define a schema
Generate structured output
base_url to your model’s production endpoint. Pass your Pydantic class to response_format and use beta.chat.completions.parse instead of the regular create method.
The response includes a parsed attribute with your data already converted to a Task object, so no JSON parsing is needed.
Engine support
Structured outputs are compatible with:- Engine-Builder-LLM, except when Lookahead speculative decoding is configured.
- BIS-LLM: except for a few exceptions like overlap scheduler enabled.
Model support
All Engine-Builder-LLM and BIS-LLM models support structured outputs out of the box with no additional configuration required.Best practices
Schema design
- Keep schemas simple: 2-3 levels max nesting for best results.
- Use basic types: str, int, float, bool when possible.
- Set defaults: Provide reasonable default values for optional fields.
- Descriptive names: Use clear, descriptive field names.
Prompt engineering
- Low temperature: Use 0.1-0.3 for consistent outputs.
- Provide schema: Dump the model schema and few-shot examples into context.
- Provide context: Give background for complex schemas.
Further reading
- Engine-Builder-LLM overview: Dense model documentation.
- BIS-LLM overview: MoE model documentation.
- Quantization guide:
FP8/FP4trade-offs.