Updated Feb 28, 2026

2026-02-22

Stop Parsing Strings: How to Enforce Structured JSON Outputs in AI Agents

Avnish Yadav

Developer & Automation Builder

👁 ~1,500 reads❤️ 120 likes↗ 45 shares

6 min readAI Automation Python Dev Tools#Structured LLM outputs #Pydantic AI #OpenAI function calling #JSON mode vs function calling #Reliable AI agents

A technical guide for developers on replacing fragile regex parsing with robust, schema-enforced LLM outputs using Python and Pydantic.

Stop Parsing Strings: How to Enforce Structured JSON Outputs in AI Agents

I still remember the first time my production cron job crashed at 3 AM. I was building a simple lead enrichment agent that took unstructured email signatures and converted them into CRM-ready JSON.

It worked perfectly in testing. But in production, on the 45th run, the LLM didn't just return the JSON object. It decided to be helpful. It returned:

Here is the data you requested:
{
  "name": "John Doe",
  ...
}

My Python script, expecting a clean JSON string, threw a parsing error. The pipeline died. I woke up to angry Slack messages.

Honestly, I see this mistake in almost every junior developer's code. We treat LLMs like databases, expecting them to respect our silent agreements. But LLMs are probabilistic engines, not deterministic functions. If you want reliability, you can't just ask nicely in the prompt. You have to enforce structure at the architectural level.

Here is how I stopped fighting with regex and started enforcing schemas using Pydantic and function calling.

The Probability Trap

When you prompt an LLM with "Return only JSON," you are fighting against its training. Most of the internet (its training data) consists of conversational wrapping. The model wants to talk to you.

In the early days, I tried to solve this with prompt engineering alone:

SYSTEM: You are a JSON machine. Do not speak. Do not add markdown. return ONLY the JSON object.

This works... mostly. But "mostly" is unacceptable in enterprise automation. At scale, a 1% failure rate on a system processing 10,000 requests means 100 broken records a day.

Strategy 1: Schema Injection (The Typescript Hack)

If you aren't using function calling APIs yet, or if you're using a smaller open-source model, the most effective way to stabilize output is by injecting the schema definition directly into the system prompt.

I found that LLMs understand TypeScript interfaces better than almost any other schema definition language because of the sheer volume of TS code in their training sets.

Instead of describing the fields in English, I paste this into my prompt:

You must output a valid JSON object matching this TypeScript interface:

interface LeadData {
  firstName: string;
  lastName: string;
  email: string | null;
  confidenceScore: number; // 0.0 to 1.0
}

This provides two things: structure and type hints. The comment // 0.0 to 1.0 acts as a soft constraint that the model respects surprisingly well. I use this heavily when prototyping micro-SaaS agents before locking down the stack.

Strategy 2: The Pydantic Pattern (Production Grade)

For actual production systems, relying on prompt syntax isn't enough. I use Python's Pydantic library coupled with OpenAI's function calling (or structured outputs) to guarantee the shape of the data.

Here is the pattern I use in 90% of my Python-based automation:

from pydantic import BaseModel, Field
from typing import List, Optional
import openai

# 1. Define the Schema
class LeadExtraction(BaseModel):
    summary: str = Field(..., description="A brief summary of the user's request")
    urgency: str = Field(..., enum=["low", "medium", "high"])
    action_items: List[str]

# 2. Force the model to use it
completion = client.chat.completions.create(
    model="gpt-4-turbo",
    messages=[{"role": "user", "content": user_input}],
    tools=[{
        "type": "function",
        "function": {
            "name": "extract_lead_data",
            "description": "Extracts structured data from email text",
            "parameters": LeadExtraction.model_json_schema()
        }
    }],
    tool_choice={"type": "function", "function": {"name": "extract_lead_data"}}
)

Why this wins:

Validation: If the LLM hallucinates a field, Pydantic throws a validation error immediately.
Type Safety: You get real integers and booleans, not strings like "true" that you have to parse manually.
Enums: The enum constraint in Pydantic forces the LLM to choose from your specific list (e.g., "low", "medium", "high"). It physically cannot output "Very High" if you configure strict mode correctly.

Strategy 3: The Self-Healing Loop

Even with function calling, models sometimes generate invalid JSON (like unescaped quotes). In high-reliability workflows, I wrap my LLM calls in a retry loop that feeds the error back to the model.

I call this the "Reflection Pattern." It looks roughly like this:

Attempt 1: Ask LLM for JSON.
Validation: Try `json.loads(response)`.
Catch Error: If it fails, grab the Python traceback.
Attempt 2: Send a new message to the LLM: "You generated invalid JSON. The error was [Error Message]. Please fix it."

I've saved thousands of API calls using this simple loop rather than just discarding the failed attempt.

My Recommendation

If you are building simple scripts, the TypeScript prompt injection is usually enough. But if you are charging customers for a product, you need Pydantic.

I don't deploy anything anymore without a defined schema. It turns the "magic" of AI into actual engineering. You wouldn't write a database query without knowing your column types—don't write a prompt without knowing your output schema.

Frequently Asked Questions

What is the difference between JSON Mode and Function Calling?

JSON Mode simply ensures the output is valid JSON syntax, but it doesn't guarantee specific fields or structure. Function Calling (or Tool Use) forces the model to adhere to a specific schema you define, making it much better for extracting data into databases.

Does using Pydantic increase token costs?

Slightly, yes. When you pass the Pydantic schema to the LLM, it is converted into a JSON schema and added to the system prompt or tool definition, which consumes input tokens. However, the cost is negligible compared to the reliability you gain.

Can I use structured outputs with open-source models like Llama 3?

Yes. Many modern open-source models are fine-tuned for function calling. Libraries like Instructor or Outlines allow you to enforce Pydantic schemas on local models running via Ollama or vLLM.

Comments

Loading comments...

← Back to Blog

Updated Feb 28, 2026

2026-02-22

Stop Parsing Strings: How to Enforce Structured JSON Outputs in AI Agents

Avnish Yadav

Developer & Automation Builder

👁 ~1,500 reads❤️ 120 likes↗ 45 shares

6 min readAI Automation Python Dev Tools#Structured LLM outputs #Pydantic AI #OpenAI function calling #JSON mode vs function calling #Reliable AI agents

A technical guide for developers on replacing fragile regex parsing with robust, schema-enforced LLM outputs using Python and Pydantic.

Stop Parsing Strings: How to Enforce Structured JSON Outputs in AI Agents

I still remember the first time my production cron job crashed at 3 AM. I was building a simple lead enrichment agent that took unstructured email signatures and converted them into CRM-ready JSON.

It worked perfectly in testing. But in production, on the 45th run, the LLM didn't just return the JSON object. It decided to be helpful. It returned:

Here is the data you requested:
{
  "name": "John Doe",
  ...
}

My Python script, expecting a clean JSON string, threw a parsing error. The pipeline died. I woke up to angry Slack messages.

Here is how I stopped fighting with regex and started enforcing schemas using Pydantic and function calling.

The Probability Trap

When you prompt an LLM with "Return only JSON," you are fighting against its training. Most of the internet (its training data) consists of conversational wrapping. The model wants to talk to you.

In the early days, I tried to solve this with prompt engineering alone:

SYSTEM: You are a JSON machine. Do not speak. Do not add markdown. return ONLY the JSON object.

This works... mostly. But "mostly" is unacceptable in enterprise automation. At scale, a 1% failure rate on a system processing 10,000 requests means 100 broken records a day.

Strategy 1: Schema Injection (The Typescript Hack)

I found that LLMs understand TypeScript interfaces better than almost any other schema definition language because of the sheer volume of TS code in their training sets.

Instead of describing the fields in English, I paste this into my prompt:

You must output a valid JSON object matching this TypeScript interface:

interface LeadData {
  firstName: string;
  lastName: string;
  email: string | null;
  confidenceScore: number; // 0.0 to 1.0
}

Strategy 2: The Pydantic Pattern (Production Grade)

Here is the pattern I use in 90% of my Python-based automation:

from pydantic import BaseModel, Field
from typing import List, Optional
import openai

# 1. Define the Schema
class LeadExtraction(BaseModel):
    summary: str = Field(..., description="A brief summary of the user's request")
    urgency: str = Field(..., enum=["low", "medium", "high"])
    action_items: List[str]

# 2. Force the model to use it
completion = client.chat.completions.create(
    model="gpt-4-turbo",
    messages=[{"role": "user", "content": user_input}],
    tools=[{
        "type": "function",
        "function": {
            "name": "extract_lead_data",
            "description": "Extracts structured data from email text",
            "parameters": LeadExtraction.model_json_schema()
        }
    }],
    tool_choice={"type": "function", "function": {"name": "extract_lead_data"}}
)

Why this wins:

Validation: If the LLM hallucinates a field, Pydantic throws a validation error immediately.
Type Safety: You get real integers and booleans, not strings like "true" that you have to parse manually.
Enums: The enum constraint in Pydantic forces the LLM to choose from your specific list (e.g., "low", "medium", "high"). It physically cannot output "Very High" if you configure strict mode correctly.

Strategy 3: The Self-Healing Loop

I call this the "Reflection Pattern." It looks roughly like this:

Attempt 1: Ask LLM for JSON.
Validation: Try `json.loads(response)`.
Catch Error: If it fails, grab the Python traceback.
Attempt 2: Send a new message to the LLM: "You generated invalid JSON. The error was [Error Message]. Please fix it."

I've saved thousands of API calls using this simple loop rather than just discarding the failed attempt.

My Recommendation

If you are building simple scripts, the TypeScript prompt injection is usually enough. But if you are charging customers for a product, you need Pydantic.

Frequently Asked Questions

What is the difference between JSON Mode and Function Calling?

Does using Pydantic increase token costs?

Can I use structured outputs with open-source models like Llama 3?

Yes. Many modern open-source models are fine-tuned for function calling. Libraries like Instructor or Outlines allow you to enforce Pydantic schemas on local models running via Ollama or vLLM.

Comments

Loading comments...

Stop Parsing Strings: How to Enforce Structured JSON Outputs in AI Agents

Stop Parsing Strings: How to Enforce Structured JSON Outputs in AI Agents

The Probability Trap

Strategy 1: Schema Injection (The Typescript Hack)

Strategy 2: The Pydantic Pattern (Production Grade)

Strategy 3: The Self-Healing Loop

My Recommendation

Frequently Asked Questions

What is the difference between JSON Mode and Function Calling?

Does using Pydantic increase token costs?

Can I use structured outputs with open-source models like Llama 3?

Comments

Add a comment

Stop Parsing Strings: How to Enforce Structured JSON Outputs in AI Agents

Stop Parsing Strings: How to Enforce Structured JSON Outputs in AI Agents

The Probability Trap

Strategy 1: Schema Injection (The Typescript Hack)

Strategy 2: The Pydantic Pattern (Production Grade)

Strategy 3: The Self-Healing Loop

My Recommendation

Frequently Asked Questions

What is the difference between JSON Mode and Function Calling?

Does using Pydantic increase token costs?

Can I use structured outputs with open-source models like Llama 3?

Comments

Add a comment