OrkaJS
Orka.JS

Chunking

Understand how Orka AI splits documents into optimally-sized chunks for embedding and retrieval.

How Chunking Works

Orka AI uses a recursive text splitter that intelligently cuts text at natural boundaries:

1
Try paragraph breaks (\n\n) first
2
Fall back to line breaks (\n)
3
Then sentence boundaries (. )
4
Then word boundaries ( )
5
Finally, character-by-character

Configuration

await orka.knowledge.create({
name: 'docs',
source: myContent,
chunkSize: 1000, // Max characters per chunk
chunkOverlap: 200, // Overlap between consecutive chunks
});

Recommended Sizes

Content TypeChunk SizeOverlapWhy
FAQ / Q&A300–50050–100Each Q&A is self-contained
Technical docs800–1200150–250Preserves code blocks
Long articles1000–1500200–300Balances context & specificity
Legal / contracts500–800200Precise clause retrieval

💡 Why Overlap?

Overlap ensures information at chunk boundaries isn't lost. When a chunk ends mid-sentence, the next chunk starts a few hundred characters earlier, capturing the full context.