OrkaJS
Orka.JS

Output Parsers

Parse and validate LLM outputs into structured data with JSON, Zod schemas, lists, and auto-fixing.

Why Output Parsers?

LLMs return unstructured text. Output parsers extract structured data, validate formats, and handle errors, making LLM outputs reliable for downstream processing.

# JSONParser

Extract and parse JSON from LLM responses, even when wrapped in markdown code blocks or mixed with text.

import { JSONParser } from 'orkajs/parsers/json';
 
const parser = new JSONParser({ strict: false });
 
// LLM response with JSON in markdown
const llmOutput = `Here's the data you requested:
 
\`\`\`json
{
"name": "Alice",
"age": 30,
"skills": ["TypeScript", "Python"]
}
\`\`\`
`;
 
const data = parser.parse(llmOutput);
console.log(data);
// { name: 'Alice', age: 30, skills: ['TypeScript', 'Python'] }
 
// Get format instructions for the LLM
const instructions = parser.getFormatInstructions();
console.log(instructions);

🎯 Smart Extraction

JSONParser automatically extracts JSON from markdown code blocks, handles both objects and arrays, and provides clear error messages when parsing fails.

# StructuredOutputParser

Parse and validate LLM outputs against a Zod schema for type-safe structured data.

📦 Installation Required

StructuredOutputParser requires Zod for schema validation:

npm install zod
import { StructuredOutputParser } from 'orkajs/parsers/structured';
import { z } from 'zod';
 
// Define schema
const schema = z.object({
name: z.string().describe('Person name'),
age: z.number().describe('Age in years'),
email: z.string().email().describe('Email address'),
skills: z.array(z.string()).describe('List of skills')
});
 
// Create parser
const parser = StructuredOutputParser.fromZodSchema(schema);
 
// Get format instructions to send to LLM
const instructions = parser.getFormatInstructions();
const prompt = `Extract person info from this text: "Alice is 30 years old..."
 
${instructions}`;
 
const llmResponse = await llm.generate(prompt);
 
// Parse and validate
try {
const data = parser.parse(llmResponse.content);
console.log(data);
// { name: 'Alice', age: 30, email: 'alice@example.com', skills: [...] }
// ✅ Type-safe and validated
} catch (error) {
console.error('Validation failed:', error.message);
}

Without Validation

const data = JSON.parse(response);
// No type safety
// No validation
// Runtime errors

With StructuredOutputParser

const data = parser.parse(response);
// ✅ Type-safe
// ✅ Validated
// ✅ Clear errors

# ListParser

Parse lists from LLM outputs, automatically handling bullet points, numbers, and custom separators.

import { ListParser } from 'orkajs/parsers/list';
 
const parser = new ListParser({
separator: '\n', // Split by newline (default)
trim: true // Remove whitespace
});
 
const llmOutput = `Here are the top programming languages:
 
- TypeScript
- Python
- Go
- Rust
`;
 
const items = parser.parse(llmOutput);
console.log(items);
// ['TypeScript', 'Python', 'Go', 'Rust']
 
// Works with numbered lists too
const numbered = `1. First item
2. Second item
3. Third item`;
 
const items2 = parser.parse(numbered);
// ['First item', 'Second item', 'Third item']
 
// Custom separator
const csvParser = new ListParser({ separator: ',' });
const csv = 'apple, banana, orange';
console.log(csvParser.parse(csv));
// ['apple', 'banana', 'orange']

# AutoFixParser

Wraps any parser and automatically retries with LLM correction when parsing fails.

import { AutoFixParser } from 'orkajs/parsers/auto-fix';
import { StructuredOutputParser } from 'orkajs/parsers/structured';
import { z } from 'zod';
 
const schema = z.object({
name: z.string(),
age: z.number()
});
 
const baseParser = StructuredOutputParser.fromZodSchema(schema);
 
const autoFixParser = new AutoFixParser({
parser: baseParser,
maxRetries: 3,
llm: orka.getLLM()
});
 
// Malformed LLM output
const badOutput = `{
"name": "Alice",
"age": "thirty" // ❌ Should be number
}`;
 
// Try to parse with auto-fix
try {
const data = await autoFixParser.parseWithRetry(badOutput);
console.log(data);
// { name: 'Alice', age: 30 } ✅ Fixed automatically
} catch (error) {
console.error('Failed after retries:', error);
}

🔄 How Auto-Fix Works

  1. Attempts to parse with base parser
  2. If parsing fails, sends error + original output to LLM
  3. LLM corrects the format
  4. Retries parsing with corrected output
  5. Repeats up to maxRetries times

# XMLParser

Parse XML-tagged outputs from LLMs. Useful when you need multiple named fields without JSON formatting, which some LLMs handle more naturally with XML tags.

import { XMLParser } from 'orkajs/parsers/xml';
 
// Basic usage — extract all XML tags
const parser = new XMLParser();
 
const llmOutput = `Here is my analysis:
 
<summary>The product has strong market potential</summary>
<sentiment>positive</sentiment>
<confidence>0.92</confidence>
<reasoning>Based on market trends and competitor analysis, the product fills a clear gap.</reasoning>`;
 
const data = parser.parse(llmOutput);
console.log(data);
// {
// summary: 'The product has strong market potential',
// sentiment: 'positive',
// confidence: '0.92',
// reasoning: 'Based on market trends and competitor analysis...'
// }
 
// Strict mode — require specific tags
const strictParser = new XMLParser({
tags: ['summary', 'sentiment', 'confidence'],
strict: true // Throws if any required tag is missing
});
 
const result = strictParser.parse(llmOutput);
// ✅ Validates that all required tags are present
 
// Get format instructions for the LLM
console.log(strictParser.getFormatInstructions());
// "Your response must use the following XML tags:
// <summary>value</summary>
// <sentiment>value</sentiment>
// <confidence>value</confidence>"

# CSVParser

Parse CSV-formatted outputs into arrays of objects. Handles quoted fields, custom separators, and optional predefined headers. Ideal for tabular data extraction from LLMs.

import { CSVParser } from 'orkajs/parsers/csv';
 
// Auto-detect headers from first row
const parser = new CSVParser();
 
const llmOutput = `name,role,experience
Alice,Engineer,5 years
Bob,Designer,3 years
Charlie,Manager,8 years`;
 
const data = parser.parse(llmOutput);
console.log(data);
// [
// { name: 'Alice', role: 'Engineer', experience: '5 years' },
// { name: 'Bob', role: 'Designer', experience: '3 years' },
// { name: 'Charlie', role: 'Manager', experience: '8 years' }
// ]
 
// Predefined headers (no header row in data)
const noHeaderParser = new CSVParser({
headers: ['product', 'price', 'stock'],
separator: ';', // Custom separator
strict: true // Enforce column count
});
 
const tabData = `iPhone;999;true
MacBook;1999;false`;
 
console.log(noHeaderParser.parse(tabData));
// [
// { product: 'iPhone', price: '999', stock: 'true' },
// { product: 'MacBook', price: '1999', stock: 'false' }
// ]
 
// Handles quoted fields with commas
const quotedCSV = `name,description
"Smith, John","Senior engineer, 10+ years"`;
console.log(parser.parse(quotedCSV));
// [{ name: 'Smith, John', description: 'Senior engineer, 10+ years' }]

# CommaSeparatedListParser

A specialized parser for comma-separated lists. Simpler than CSVParser when you just need a flat list of values. Supports deduplication and automatic trimming.

import { CommaSeparatedListParser } from 'orkajs/parsers/comma-separated-list';
 
const parser = new CommaSeparatedListParser({
trim: true, // Remove whitespace (default: true)
removeDuplicates: false // Keep duplicates (default: false)
});
 
const llmOutput = 'TypeScript, Python, Go, Rust, JavaScript';
const items = parser.parse(llmOutput);
console.log(items);
// ['TypeScript', 'Python', 'Go', 'Rust', 'JavaScript']
 
// With deduplication
const deduper = new CommaSeparatedListParser({ removeDuplicates: true });
const dupes = 'apple, banana, apple, orange, banana';
console.log(deduper.parse(dupes));
// ['apple', 'banana', 'orange']
 
// Format instructions for the LLM
console.log(parser.getFormatInstructions());
// "Your response must be a comma-separated list of values.
// Example: item1, item2, item3"

Complete Example

import { createOrka, OpenAIAdapter } from 'orkajs';
import { StructuredOutputParser } from 'orkajs/parsers/structured';
import { AutoFixParser } from 'orkajs/parsers/auto-fix';
import { z } from 'zod';
 
const orka = createOrka({
llm: new OpenAIAdapter({ apiKey: process.env.OPENAI_API_KEY! }),
vectorDB: /* ... */
});
 
// Define schema
const productSchema = z.object({
name: z.string(),
price: z.number(),
category: z.enum(['electronics', 'clothing', 'food']),
inStock: z.boolean(),
tags: z.array(z.string())
});
 
// Create parser with auto-fix
const baseParser = StructuredOutputParser.fromZodSchema(productSchema);
const parser = new AutoFixParser({
parser: baseParser,
maxRetries: 2,
llm: orka.getLLM()
});
 
// Generate structured output
const prompt = `Extract product information from this description:
"The iPhone 15 Pro costs $999 and is currently available.
It's an electronics item with tags: smartphone, apple, 5g"
 
${baseParser.getFormatInstructions()}`;
 
const response = await orka.generate(prompt);
 
// Parse with validation and auto-fix
const product = await parser.parseWithRetry(response);
 
console.log(product);
// {
// name: 'iPhone 15 Pro',
// price: 999,
// category: 'electronics',
// inStock: true,
// tags: ['smartphone', 'apple', '5g']
// }
// ✅ Type-safe, validated, and auto-corrected if needed

Comparison

ParserUse CaseValidation
JSONParserSimple JSON extractionBasic JSON syntax
StructuredOutputType-safe structured dataZod schema validation
ListParserLists, arrays, enumerationsFormat cleaning
AutoFixParserUnreliable LLM outputsLLM-powered correction
XMLParserMulti-field extraction with XML tagsTag presence (strict mode)
CSVParserTabular data extractionColumn count (strict mode)
CommaSeparatedListSimple comma-separated valuesNon-empty list

Best Practices

1. Include Format Instructions

Always add parser.getFormatInstructions() to your prompts to guide the LLM.

2. Use Zod for Complex Schemas

StructuredOutputParser with Zod provides type safety, validation, and clear error messages.

3. Use AutoFix Sparingly

AutoFixParser makes extra LLM calls. Use it for critical data or when LLM outputs are unreliable.

Tree-shaking Imports

// ✅ Import only what you need
import { StructuredOutputParser } from 'orkajs/parsers/structured';
import { AutoFixParser } from 'orkajs/parsers/auto-fix';
import { XMLParser } from 'orkajs/parsers/xml';
import { CSVParser } from 'orkajs/parsers/csv';
import { CommaSeparatedListParser } from 'orkajs/parsers/comma-separated-list';
 
// ✅ Or import from index
import { JSONParser, ListParser, XMLParser, CSVParser } from 'orkajs/parsers';