Output Parsers

Parse and validate LLM outputs into structured data with JSON, Zod schemas, lists, and auto-fixing.

Why Output Parsers?

LLMs return unstructured text. Output parsers extract structured data, validate formats, and handle errors, making LLM outputs reliable for downstream processing.

# JSONParser

Extract and parse JSON from LLM responses, even when wrapped in markdown code blocks or mixed with text.

import { JSONParser } from '@orka-js/tools';
 
const parser = new JSONParser({ strict: false });
 
// LLM response with JSON in markdown
const llmOutput = `Here's the data you requested:
 
//json
//{
//  "name": "Alice",
//  "age": 30,
//  "skills": ["TypeScript", "Python"]
//}
 
const data = parser.parse(llmOutput);
console.log(data);
// { name: 'Alice', age: 30, skills: ['TypeScript', 'Python'] }
 
// Get format instructions for the LLM
const instructions = parser.getFormatInstructions();
console.log(instructions);

Smart Extraction

JSONParser automatically extracts JSON from markdown code blocks, handles both objects and arrays, and provides clear error messages when parsing fails.

# StructuredOutputParser

Parse and validate LLM outputs against a Zod schema for type-safe structured data.

📦 Installation Required

StructuredOutputParser requires Zod for schema validation:

npm install zod

import { StructuredOutputParser } from '@orka-js/tools';
import { z } from 'zod';
 
// Define schema
const schema = z.object({
  name: z.string().describe('Person name'),
  age: z.number().describe('Age in years'),
  email: z.string().email().describe('Email address'),
  skills: z.array(z.string()).describe('List of skills')
});
 
// Create parser
const parser = StructuredOutputParser.fromZodSchema(schema);
 
// Get format instructions to send to LLM
const instructions = parser.getFormatInstructions();
const prompt = `${isEn ? "Extract person info from this text:" : "Extrait les informations de cette personne:"} "Alice is 30 years old..."
 
${instructions}`;
 
const llmResponse = await llm.generate(prompt);
 
// Parse and validate
try {
  const data = parser.parse(llmResponse.content);
  console.log(data);
  // { name: 'Alice', age: 30, email: 'alice@example.com', skills: [...] }
  // ✅ Type-safe and validated
} catch (error) {
  console.error('Validation failed:', error.message);
}

Naive Parsing

Unchecked JSON

JSON.parse(rawResponse);

High Hallucination Risk
Silent Runtime Failures
Manual Type Casting

Structured Extraction

Schema Enforcement

parser.parse(llmOutput);

Type-Safe with Zod
Automatic Error Catching
Production Ready

Data Integrity

Schema Validated

# ListParser

Parse lists from LLM outputs, automatically handling bullet points, numbers, and custom separators.

import { ListParser } from '@orka-js/tools';
 
const parser = new ListParser({
  separator: '\n',  // Split by newline (default)
  trim: true        // Remove whitespace
});
 
const llmOutput = `Here are the top programming languages:
 
- TypeScript
- Python
- Go
- Rust
`;
 
const items = parser.parse(llmOutput);
console.log(items);
// ['TypeScript', 'Python', 'Go', 'Rust']
 
// Works with numbered lists too
const numbered = `1. First item
2. Second item
3. Third item`;
 
const items2 = parser.parse(numbered);
// ['First item', 'Second item', 'Third item']
 
// Custom separator
const csvParser = new ListParser({ separator: ',' });
const csv = 'apple, banana, orange';
console.log(csvParser.parse(csv));
// ['apple', 'banana', 'orange']

# AutoFixParser

Wraps any parser and automatically retries with LLM correction when parsing fails.

import { AutoFixParser, StructuredOutputParser } from '@orka-js/tools';
import { z } from 'zod';
 
const schema = z.object({
  name: z.string(),
  age: z.number()
});
 
const baseParser = StructuredOutputParser.fromZodSchema(schema);
 
const autoFixParser = new AutoFixParser({
  parser: baseParser,
  maxRetries: 3,
  llm: orka.getLLM()
});
 
// Malformed LLM output
const badOutput = `{
  "name": "Alice",
  "age": "thirty"  // ❌ Should be number
}`;
 
// Try to parse with auto-fix
try {
  const data = await autoFixParser.parseWithRetry(badOutput);
  console.log(data);
  // { name: 'Alice', age: 30 }  ✅ Fixed automatically
} catch (error) {
  console.error('Failed after retries:', error);
}

Base Parsing Attempt

The engine applies the initial schema (JSON/Zod) to the LLM's raw response.

Step 1: ParseValidation Phase

Error Capture

If a syntax or schema error occurs, the trace and original output are preserved.

Step 2: DetectFailure Handling

LLM Reflection

A specific sub-prompt instructs the LLM to fix its own output based on the error log.

Step 3: CorrectSelf-Healing

Final Validation

The corrected output is re-parsed. This loop continues until success or maxRetries.

Step 4: RetryCycle Repeat

# XMLParser

Parse XML-tagged outputs from LLMs. Useful when you need multiple named fields without JSON formatting, which some LLMs handle more naturally with XML tags.

import { XMLParser } from '@orka-js/tools';
 
// Basic usage — extract all XML tags
const parser = new XMLParser();
 
const llmOutput = `Here is my analysis:
 
<summary>The product has strong market potential</summary>
<sentiment>positive</sentiment>
<confidence>0.92</confidence>
<reasoning>Based on market trends and competitor analysis, the product fills a clear gap.</reasoning>`;
 
const data = parser.parse(llmOutput);
console.log(data);
// {
//   summary: 'The product has strong market potential',
//   sentiment: 'positive',
//   confidence: '0.92',
//   reasoning: 'Based on market trends and competitor analysis...'
// }
 
// Strict mode — require specific tags
const strictParser = new XMLParser({
  tags: ['summary', 'sentiment', 'confidence'],
  strict: true  // Throws if any required tag is missing
});
 
const result = strictParser.parse(llmOutput);
// ✅ Validates that all required tags are present
 
// Get format instructions for the LLM
console.log(strictParser.getFormatInstructions());
// "Your response must use the following XML tags:
//  <summary>value</summary>
//  <sentiment>value</sentiment>
//  <confidence>value</confidence>"

# CSVParser

Parse CSV-formatted outputs into arrays of objects. Handles quoted fields, custom separators, and optional predefined headers. Ideal for tabular data extraction from LLMs.

import { CSVParser } from '@orka-js/tools';
 
// Auto-detect headers from first row
const parser = new CSVParser();
 
const llmOutput = `name,role,experience
Alice,Engineer,5 years
Bob,Designer,3 years
Charlie,Manager,8 years`;
 
const data = parser.parse(llmOutput);
console.log(data);
// [
//   { name: 'Alice', role: 'Engineer', experience: '5 years' },
//   { name: 'Bob', role: 'Designer', experience: '3 years' },
//   { name: 'Charlie', role: 'Manager', experience: '8 years' }
// ]
 
// Predefined headers (no header row in data)
const noHeaderParser = new CSVParser({
  headers: ['product', 'price', 'stock'],
  separator: ';',  // Custom separator
  strict: true     // Enforce column count
});
 
const tabData = `iPhone;999;true
MacBook;1999;false`;
 
console.log(noHeaderParser.parse(tabData));
// [
//   { product: 'iPhone', price: '999', stock: 'true' },
//   { product: 'MacBook', price: '1999', stock: 'false' }
// ]
 
// Handles quoted fields with commas
const quotedCSV = `name,description
"Smith, John","Senior engineer, 10+ years"`;
console.log(parser.parse(quotedCSV));
// [{ name: 'Smith, John', description: 'Senior engineer, 10+ years' }]

# CommaSeparatedListParser

A specialized parser for comma-separated lists. Simpler than CSVParser when you just need a flat list of values. Supports deduplication and automatic trimming.

import { CommaSeparatedListParser } from '@orka-js/tools';
 
const parser = new CommaSeparatedListParser({
  trim: true,              // Remove whitespace
  removeDuplicates: false  // Keep duplicates
});
 
const llmOutput = 'TypeScript, Python, Go, Rust, JavaScript';
const items = parser.parse(llmOutput);
console.log(items);
// ['TypeScript', 'Python', 'Go', 'Rust', 'JavaScript']
 
// With deduplication
const deduper = new CommaSeparatedListParser({ removeDuplicates: true });
const dupes = 'apple, banana, apple, orange, banana';
console.log(deduper.parse(dupes));
// ['apple', 'banana', 'orange']
 
// Format instructions for the LLM
console.log(parser.getFormatInstructions());
// "Your response must be a comma-separated list of values.
//  Example: item1, item2, item3"

Complete Example

import { createOrka } from '@orka-js/core';
import { OpenAIAdapter } from '@orka-js/openai';
import { StructuredOutputParser, AutoFixParser } from '@orka-js/tools';
import { z } from 'zod';
 
const orka = createOrka({
  llm: new OpenAIAdapter({ apiKey: process.env.OPENAI_API_KEY! }),
  vectorDB: /* ... */
});
 
// Define schema
const productSchema = z.object({
  name: z.string(),
  price: z.number(),
  category: z.enum(['electronics', 'clothing', 'food']),
  inStock: z.boolean(),
  tags: z.array(z.string())
});
 
// Create parser with auto-fix
const baseParser = StructuredOutputParser.fromZodSchema(productSchema);
const parser = new AutoFixParser({
  parser: baseParser,
  maxRetries: 2,
  llm: orka.getLLM()
});
 
// Generate structured output
const prompt = `Extract product information from this description:
The iPhone 15 Pro costs $999 and is currently available. 
It's an electronics item with tags: smartphone, apple, 5g
 
${baseParser.getFormatInstructions()}`;
 
const response = await orka.generate(prompt);
 
// Parse with validation and auto-fix
const product = await parser.parseWithRetry(response);
 
console.log(product);
// {
//   name: 'iPhone 15 Pro',
//   price: 999,
//   category: 'electronics',
//   inStock: true,
//   tags: ['smartphone', 'apple', '5g']
// }
// ✅ Type-safe, validated, and auto-corrected if needed

Comparison

Output Parser	Extraction Logic	Validation Strategy
`StructuredOutput`Enterprise	Complex Type-safe ObjectsZod-driven mapping	Active Check
`AutoFixParser`Resilient	Self-healing DataRecursive LLM repair	Active Check
`JSONParser`Classic	Standard Object DataStrict JSON parsing	Active Check
`XMLParser`Robust	Tag-based Multi-fieldsStructural tag isolation	Active Check
`ListParser`Clean	Array CollectionsRegex-based splitting	Active Check
`CSVParser`Legacy	Tabular Data StreamsColumnar extraction	Active Check
`CSVList`Light	Quick Tags & EnumsComma-delimitated	Active Check

Best Practices

1. Include Format Instructions

Always add parser.getFormatInstructions() to your prompts to guide the LLM.

2. Use Zod for Complex Schemas

StructuredOutputParser with Zod provides type safety, validation, and clear error messages.

3. Use AutoFix Sparingly

AutoFixParser makes extra LLM calls. Use it for critical data or when LLM outputs are unreliable.

Tree-shaking Imports

// ✅ Import only what you need
import { StructuredOutputParser } from '@orka-js/tools';
import { AutoFixParser } from '@orka-js/tools';
import { XMLParser } from '@orka-js/tools';
import { CSVParser } from '@orka-js/tools';
import { CommaSeparatedListParser } from '@orka-js/tools';
 
// ✅ Or import from index
import { JSONParser, ListParser, XMLParser, CSVParser } from '@orka-js/tools';