OrkaJS
Orka.JS

Advanced Retrievers

Improve retrieval quality with multi-query expansion, contextual compression, and ensemble methods.

Why Advanced Retrievers?

Basic vector search can miss relevant documents due to query phrasing or semantic gaps. Advanced retrievers improve recall and precision through query expansion, result compression, and fusion techniques.

# MultiQueryRetriever

Generates multiple alternative queries using an LLM, retrieves results for each, and deduplicates to improve recall.

import { MultiQueryRetriever } from 'orkajs/retrievers/multi-query';
import { createOrka, OpenAIAdapter, PineconeAdapter } from 'orkajs';
 
const orka = createOrka({
llm: new OpenAIAdapter({ apiKey: process.env.OPENAI_API_KEY! }),
vectorDB: new PineconeAdapter({ /* config */ })
});
 
const retriever = new MultiQueryRetriever({
llm: orka.getLLM(),
vectorDB: orka.knowledge['vectorDB'],
queryCount: 3, // Generate 3 alternative queries
topK: 5, // Return top 5 results per query
deduplicateByContent: true // Remove duplicate results
});
 
const results = await retriever.retrieve(
'How do I configure Orka AI?',
'my-knowledge-base'
);
 
// Returns deduplicated results from all query variations

🎯 How It Works

  1. Original query: "How do I configure Orka AI?"
  2. LLM generates alternatives:
    • "What are the configuration options for Orka AI?"
    • "How to set up Orka AI settings?"
    • "Orka AI configuration guide"
  3. Searches with all queries
  4. Deduplicates and ranks results

# ContextualCompressionRetriever

Retrieves more documents than needed, then uses an LLM to extract only the relevant parts, improving precision and reducing context size.

import { ContextualCompressionRetriever } from 'orkajs/retrievers/compression';
 
const retriever = new ContextualCompressionRetriever({
llm: orka.getLLM(),
vectorDB: orka.knowledge['vectorDB'],
topK: 10, // Retrieve 10 documents
maxCompressedLength: 500 // Compress each to ~500 chars
});
 
const results = await retriever.retrieve(
'What are the benefits of RAG?',
'my-knowledge-base'
);
 
// Each result contains only the relevant extract, not the full document

Without Compression

Returns full documents (1000+ chars each) with irrelevant sections, wasting context window.

With Compression

Returns only relevant extracts (200-500 chars), maximizing context efficiency and relevance.

# EnsembleRetriever

Combines multiple retrievers using Reciprocal Rank Fusion (RRF) for improved results. Useful for combining different retrieval strategies.

import { EnsembleRetriever } from 'orkajs/retrievers/ensemble';
import { VectorRetriever } from 'orkajs/retrievers/vector';
import { MultiQueryRetriever } from 'orkajs/retrievers/multi-query';
 
// Create individual retrievers
const vectorRetriever = new VectorRetriever({
llm: orka.getLLM(),
vectorDB: orka.knowledge['vectorDB'],
topK: 10
});
 
const multiQueryRetriever = new MultiQueryRetriever({
llm: orka.getLLM(),
vectorDB: orka.knowledge['vectorDB'],
queryCount: 3,
topK: 10
});
 
// Combine with weighted fusion
const ensemble = new EnsembleRetriever({
retrievers: [vectorRetriever, multiQueryRetriever],
weights: [0.4, 0.6], // 40% vector, 60% multi-query
topK: 5 // Return top 5 fused results
});
 
const results = await ensemble.retrieve(
'Explain RAG architecture',
'my-knowledge-base'
);

🔬 Reciprocal Rank Fusion (RRF)

RRF combines rankings from multiple sources by giving higher scores to documents that appear in top positions across multiple retrievers.

// Formula: score = weight * (1 / (rank + 60))
// Document at rank 1 in Retriever A: 0.4 * (1/61) = 0.0066
// Same document at rank 3 in Retriever B: 0.6 * (1/63) = 0.0095
// Final fusion score: 0.0066 + 0.0095 = 0.0161

# VectorRetriever

Basic vector search wrapper that implements the Retriever interface. Useful as a building block for ensemble retrievers.

import { VectorRetriever } from 'orkajs/retrievers/vector';
 
const retriever = new VectorRetriever({
llm: orka.getLLM(),
vectorDB: orka.knowledge['vectorDB'],
topK: 5,
minScore: 0.7 // Filter results below 0.7 similarity
});
 
const results = await retriever.retrieve(
'What is RAG?',
'my-knowledge-base'
);

# ParentDocumentRetriever

Searches on small child chunks for precision, then returns the full parent document for context. This solves the classic trade-off: small chunks are better for search accuracy, but large chunks provide more context for the LLM.

import { ParentDocumentRetriever } from 'orkajs/retrievers/parent-document';
 
const retriever = new ParentDocumentRetriever({
llm: orka.getLLM(),
vectorDB: orka.knowledge['vectorDB'],
childTopK: 10, // Search top 10 child chunks
parentTopK: 3, // Return top 3 parent documents
minScore: 0.6
});
 
const results = await retriever.retrieve(
'How does authentication work?',
'documentation'
);
 
// Returns full parent documents, ranked by best child chunk score
// Each result includes metadata: { childCount, parentContent, ... }

📐 How It Works

  1. Index documents as small chunks with parentId in metadata
  2. Search finds the most relevant child chunks
  3. Groups children by parent document
  4. Returns full parent content, ranked by best child score

# SelfQueryRetriever

Uses an LLM to automatically extract metadata filters from natural language queries. Instead of just semantic search, it combines meaning-based search with structured metadata filtering for more precise results.

import { SelfQueryRetriever } from 'orkajs/retrievers/self-query';
 
const retriever = new SelfQueryRetriever({
llm: orka.getLLM(),
vectorDB: orka.knowledge['vectorDB'],
topK: 5,
metadataFields: [
{
name: 'category',
type: 'string',
description: 'The document category',
enumValues: ['tutorial', 'api-reference', 'guide', 'changelog']
},
{
name: 'language',
type: 'string',
description: 'Programming language',
enumValues: ['typescript', 'python', 'javascript']
},
{
name: 'version',
type: 'number',
description: 'The version number of the documentation'
}
]
});
 
// Natural language query with implicit filters
const results = await retriever.retrieve(
'Show me TypeScript tutorials about authentication in version 3',
'documentation'
);
 
// The LLM extracts:
// semanticQuery: "authentication"
// filter: { language: "typescript", category: "tutorial", version: 3 }

🧠 Query Decomposition Example

The LLM automatically separates the semantic meaning from the structured filters:

// User query: "Find Python guides about deployment from 2024"
// LLM extracts:
{
"semanticQuery": "deployment",
"filter": {
"language": "python",
"category": "guide"
}
}

# BM25Retriever

A keyword-based retriever using the BM25 (Best Matching 25) algorithm. Unlike vector search which relies on semantic similarity, BM25 uses term frequency and inverse document frequency for exact keyword matching. Perfect for combining with vector search in an EnsembleRetriever.

import { BM25Retriever } from 'orkajs/retrievers/bm25';
 
const bm25 = new BM25Retriever({
documents: [
{ id: '1', content: 'TypeScript is a typed superset of JavaScript...', metadata: { source: 'docs' } },
{ id: '2', content: 'React hooks allow you to use state in functional components...', metadata: { source: 'blog' } },
{ id: '3', content: 'Node.js is a JavaScript runtime built on Chrome V8...', metadata: { source: 'docs' } },
],
topK: 5,
k1: 1.5, // Term frequency saturation (default: 1.5)
b: 0.75 // Document length normalization (default: 0.75)
});
 
const results = await bm25.retrieve('JavaScript runtime', 'any');
// Finds documents with exact keyword matches for "JavaScript" and "runtime"
 
// Add more documents dynamically
bm25.addDocuments([
{ id: '4', content: 'Deno is a modern JavaScript/TypeScript runtime...' }
]);

# BM25 + Vector Search (Hybrid)

The most powerful retrieval strategy combines BM25 (keyword matching) with vector search (semantic understanding) using the EnsembleRetriever:

import { EnsembleRetriever } from 'orkajs/retrievers/ensemble';
import { VectorRetriever } from 'orkajs/retrievers/vector';
import { BM25Retriever } from 'orkajs/retrievers/bm25';
 
// Keyword-based retrieval
const bm25 = new BM25Retriever({
documents: myDocuments,
topK: 10
});
 
// Semantic retrieval
const vector = new VectorRetriever({
llm: orka.getLLM(),
vectorDB: orka.knowledge['vectorDB'],
topK: 10
});
 
// Hybrid: combine both with Reciprocal Rank Fusion
const hybrid = new EnsembleRetriever({
retrievers: [bm25, vector],
weights: [0.3, 0.7], // 30% keyword, 70% semantic
topK: 5
});
 
const results = await hybrid.retrieve('authentication middleware', 'docs');
// Finds docs matching keywords AND semantically similar content

Comparison

RetrieverImprovesTrade-off
MultiQueryRecall (finds more relevant docs)Extra LLM calls
CompressionPrecision (removes noise)LLM calls for compression
EnsembleBoth recall & precisionMultiple retrieval passes
VectorSpeed (single pass)May miss variations
ParentDocumentContext (full doc with precise search)Requires parentId metadata
SelfQueryFiltering (structured + semantic)LLM call for query parsing
BM25Keyword matching (exact terms)No semantic understanding

Complete Example

import { createOrka, OpenAIAdapter, PineconeAdapter } from 'orkajs';
import { MultiQueryRetriever } from 'orkajs/retrievers/multi-query';
import { ContextualCompressionRetriever } from 'orkajs/retrievers/compression';
import { EnsembleRetriever } from 'orkajs/retrievers/ensemble';
 
const orka = createOrka({
llm: new OpenAIAdapter({ apiKey: process.env.OPENAI_API_KEY! }),
vectorDB: new PineconeAdapter({ /* config */ })
});
 
// Strategy 1: Multi-query for better recall
const multiQuery = new MultiQueryRetriever({
llm: orka.getLLM(),
vectorDB: orka.knowledge['vectorDB'],
queryCount: 3,
topK: 10
});
 
// Strategy 2: Compression for better precision
const compression = new ContextualCompressionRetriever({
llm: orka.getLLM(),
vectorDB: orka.knowledge['vectorDB'],
topK: 15,
maxCompressedLength: 400
});
 
// Combine both strategies
const ensemble = new EnsembleRetriever({
retrievers: [multiQuery, compression],
weights: [0.5, 0.5],
topK: 5
});
 
// Retrieve with best of both worlds
const results = await ensemble.retrieve(
'How does RAG improve LLM responses?',
'documentation'
);
 
console.log(`Found ${results.length} highly relevant results`);
results.forEach(r => {
console.log(`Score: ${r.score.toFixed(3)}`);
console.log(`Content: ${r.content?.slice(0, 100)}...`);
});

Best Practices

1. Start Simple

Begin with VectorRetriever. Add MultiQuery if recall is low. Add Compression if precision is low.

2. Monitor Costs

MultiQuery and Compression make extra LLM calls. Use caching or limit query count in production.

3. Tune Weights

Experiment with ensemble weights based on your use case. Higher weight = more influence on final ranking.

Tree-shaking Imports

// ✅ Import only what you need
import { MultiQueryRetriever } from 'orkajs/retrievers/multi-query';
import { EnsembleRetriever } from 'orkajs/retrievers/ensemble';
 
// ✅ Or import from index
import { MultiQueryRetriever, ContextualCompressionRetriever } from 'orkajs/retrievers';