Plain text is what models output by default. Structured output is the craft of getting them to output something a downstream program can parse — JSON, XML, CSV, a specific format you defined. It sounds trivial. It is the source of a surprising amount of production AI pain.
The three levels of structure
Level 1: Ask nicely. "Respond in JSON with the keys name and
age." Works ~95% of the time on a frontier model. Fails on edge cases
in ways your tests don't catch.
Level 2: Schema-aware prompting. Show the model a JSON Schema or TypeScript interface, give a few-shot example, ask explicitly for valid JSON, no prose, no markdown fences. Works ~99% of the time.
Level 3: Constrained decoding. The model provider enforces the schema at the token level during generation. Every token sampled is restricted to those that keep the output schema-valid. You get 100% valid JSON, by construction. OpenAI calls this "strict mode" / "JSON schema mode"; Anthropic offers tool-use schemas; open-source has libraries like Outlines and Guidance.
The gotcha that's not about parsing
Even when the JSON is perfectly valid, the content can be wrong.
The model can produce {"price": "free"} when you asked for a number.
It can produce {"date": "tomorrow"} when you needed an ISO 8601
string. The schema only constrains the shape; correctness still
depends on the model.
The practical move: combine strict-mode JSON with explicit field
descriptions ({"price": "amount in USD as a number"}) and let the
model reason in prose first ("First, identify the price. Then…"),
then emit the JSON.
When structure costs you
Constrained decoding is slightly slower than free generation — the model has to compute the probability of every allowed token at every step. For long outputs this adds up.
More importantly, a tight schema can prevent the model from doing useful work. If your schema demands a category from a fixed list of three but the right answer is "none of the above," the model picks the least-wrong one. Always include an "other" / "unknown" option.
Why this matters for your work
Anything you build that pipes AI output into another system needs structured output. Logs into a dashboard, extractions into a database, classifications into a workflow — all of it. The 1-in-100 silent parse failure is the difference between a demo and a production system you can sleep through.
Always validate the parsed object against expectations — even with strict mode. The shape is enforced; the meaning isn't.
What to read next
Function calling is structured output applied to "which tool to call." Temperature sampling is the wider control surface. Chain of thought + structured output is the pattern of "reason in prose, then constrain the conclusion."