Chunking
Understand how Orka AI splits documents into optimally-sized chunks for embedding and retrieval.
How Chunking Works
Orka AI uses a recursive text splitter that intelligently cuts text at natural boundaries:
1
Try paragraph breaks (\n\n) first2
Fall back to line breaks (\n)3
Then sentence boundaries (. )4
Then word boundaries ( )5
Finally, character-by-characterConfiguration
await orka.knowledge.create({ name: 'docs', source: myContent, chunkSize: 1000, // Max characters per chunk chunkOverlap: 200, // Overlap between consecutive chunks});Recommended Sizes
| Content Type | Chunk Size | Overlap | Why |
|---|---|---|---|
| FAQ / Q&A | 300–500 | 50–100 | Each Q&A is self-contained |
| Technical docs | 800–1200 | 150–250 | Preserves code blocks |
| Long articles | 1000–1500 | 200–300 | Balances context & specificity |
| Legal / contracts | 500–800 | 200 | Precise clause retrieval |
💡 Why Overlap?
Overlap ensures information at chunk boundaries isn't lost. When a chunk ends mid-sentence, the next chunk starts a few hundred characters earlier, capturing the full context.