Structured outputs requires an LLM deployed using the TensorRT-LLM Engine Builder.If you want to try this structured output example code for yourself, deploy this implementation of Llama 3.1 8B.
- Define an object schema with Pydantic.
- Pass the schema to the LLM with the
response_format
argument. - Receive output that is guaranteed to match the provided schema, including types and validations like
max_length
.
Schema generation with Pydantic
Pydantic is an industry standard Python library for data validation. With Pydantic, we’ll build precise schemas for LLM output to match. For example, here’s a schema for a basicPerson
object.
max_length
.
Add response format to LLM call
The first time that you pass a given schema for the model, it can take a
minute for the schema to be processed and cached. Subsequent calls with the
same schema will run at normal speeds.
response_format
field: