Advanced Retrievers
Improve retrieval quality with multi-query expansion, contextual compression, and ensemble methods.
Why Advanced Retrievers?
Basic vector search can miss relevant documents due to query phrasing or semantic gaps. Advanced retrievers improve recall and precision through query expansion, result compression, and fusion techniques.
# MultiQueryRetriever
Generates multiple alternative queries using an LLM, retrieves results for each, and deduplicates to improve recall.
import { MultiQueryRetriever } from 'orkajs/retrievers/multi-query';import { createOrka, OpenAIAdapter, PineconeAdapter } from 'orkajs'; const orka = createOrka({ llm: new OpenAIAdapter({ apiKey: process.env.OPENAI_API_KEY! }), vectorDB: new PineconeAdapter({ /* config */ })}); const retriever = new MultiQueryRetriever({ llm: orka.getLLM(), vectorDB: orka.knowledge['vectorDB'], queryCount: 3, // Generate 3 alternative queries topK: 5, // Return top 5 results per query deduplicateByContent: true // Remove duplicate results}); const results = await retriever.retrieve( 'How do I configure Orka AI?', 'my-knowledge-base'); // Returns deduplicated results from all query variations🎯 How It Works
- Original query: "How do I configure Orka AI?"
- LLM generates alternatives:
- "What are the configuration options for Orka AI?"
- "How to set up Orka AI settings?"
- "Orka AI configuration guide"
- Searches with all queries
- Deduplicates and ranks results
# ContextualCompressionRetriever
Retrieves more documents than needed, then uses an LLM to extract only the relevant parts, improving precision and reducing context size.
import { ContextualCompressionRetriever } from 'orkajs/retrievers/compression'; const retriever = new ContextualCompressionRetriever({ llm: orka.getLLM(), vectorDB: orka.knowledge['vectorDB'], topK: 10, // Retrieve 10 documents maxCompressedLength: 500 // Compress each to ~500 chars}); const results = await retriever.retrieve( 'What are the benefits of RAG?', 'my-knowledge-base'); // Each result contains only the relevant extract, not the full document❌ Without Compression
Returns full documents (1000+ chars each) with irrelevant sections, wasting context window.
✅ With Compression
Returns only relevant extracts (200-500 chars), maximizing context efficiency and relevance.
# EnsembleRetriever
Combines multiple retrievers using Reciprocal Rank Fusion (RRF) for improved results. Useful for combining different retrieval strategies.
import { EnsembleRetriever } from 'orkajs/retrievers/ensemble';import { VectorRetriever } from 'orkajs/retrievers/vector';import { MultiQueryRetriever } from 'orkajs/retrievers/multi-query'; // Create individual retrieversconst vectorRetriever = new VectorRetriever({ llm: orka.getLLM(), vectorDB: orka.knowledge['vectorDB'], topK: 10}); const multiQueryRetriever = new MultiQueryRetriever({ llm: orka.getLLM(), vectorDB: orka.knowledge['vectorDB'], queryCount: 3, topK: 10}); // Combine with weighted fusionconst ensemble = new EnsembleRetriever({ retrievers: [vectorRetriever, multiQueryRetriever], weights: [0.4, 0.6], // 40% vector, 60% multi-query topK: 5 // Return top 5 fused results}); const results = await ensemble.retrieve( 'Explain RAG architecture', 'my-knowledge-base');🔬 Reciprocal Rank Fusion (RRF)
RRF combines rankings from multiple sources by giving higher scores to documents that appear in top positions across multiple retrievers.
// Formula: score = weight * (1 / (rank + 60))// Document at rank 1 in Retriever A: 0.4 * (1/61) = 0.0066// Same document at rank 3 in Retriever B: 0.6 * (1/63) = 0.0095// Final fusion score: 0.0066 + 0.0095 = 0.0161# VectorRetriever
Basic vector search wrapper that implements the Retriever interface. Useful as a building block for ensemble retrievers.
import { VectorRetriever } from 'orkajs/retrievers/vector'; const retriever = new VectorRetriever({ llm: orka.getLLM(), vectorDB: orka.knowledge['vectorDB'], topK: 5, minScore: 0.7 // Filter results below 0.7 similarity}); const results = await retriever.retrieve( 'What is RAG?', 'my-knowledge-base');# ParentDocumentRetriever
Searches on small child chunks for precision, then returns the full parent document for context. This solves the classic trade-off: small chunks are better for search accuracy, but large chunks provide more context for the LLM.
import { ParentDocumentRetriever } from 'orkajs/retrievers/parent-document'; const retriever = new ParentDocumentRetriever({ llm: orka.getLLM(), vectorDB: orka.knowledge['vectorDB'], childTopK: 10, // Search top 10 child chunks parentTopK: 3, // Return top 3 parent documents minScore: 0.6}); const results = await retriever.retrieve( 'How does authentication work?', 'documentation'); // Returns full parent documents, ranked by best child chunk score// Each result includes metadata: { childCount, parentContent, ... }📐 How It Works
- Index documents as small chunks with parentId in metadata
- Search finds the most relevant child chunks
- Groups children by parent document
- Returns full parent content, ranked by best child score
# SelfQueryRetriever
Uses an LLM to automatically extract metadata filters from natural language queries. Instead of just semantic search, it combines meaning-based search with structured metadata filtering for more precise results.
import { SelfQueryRetriever } from 'orkajs/retrievers/self-query'; const retriever = new SelfQueryRetriever({ llm: orka.getLLM(), vectorDB: orka.knowledge['vectorDB'], topK: 5, metadataFields: [ { name: 'category', type: 'string', description: 'The document category', enumValues: ['tutorial', 'api-reference', 'guide', 'changelog'] }, { name: 'language', type: 'string', description: 'Programming language', enumValues: ['typescript', 'python', 'javascript'] }, { name: 'version', type: 'number', description: 'The version number of the documentation' } ]}); // Natural language query with implicit filtersconst results = await retriever.retrieve( 'Show me TypeScript tutorials about authentication in version 3', 'documentation'); // The LLM extracts:// semanticQuery: "authentication"// filter: { language: "typescript", category: "tutorial", version: 3 }🧠 Query Decomposition Example
The LLM automatically separates the semantic meaning from the structured filters:
// User query: "Find Python guides about deployment from 2024"// LLM extracts:{ "semanticQuery": "deployment", "filter": { "language": "python", "category": "guide" }}# BM25Retriever
A keyword-based retriever using the BM25 (Best Matching 25) algorithm. Unlike vector search which relies on semantic similarity, BM25 uses term frequency and inverse document frequency for exact keyword matching. Perfect for combining with vector search in an EnsembleRetriever.
import { BM25Retriever } from 'orkajs/retrievers/bm25'; const bm25 = new BM25Retriever({ documents: [ { id: '1', content: 'TypeScript is a typed superset of JavaScript...', metadata: { source: 'docs' } }, { id: '2', content: 'React hooks allow you to use state in functional components...', metadata: { source: 'blog' } }, { id: '3', content: 'Node.js is a JavaScript runtime built on Chrome V8...', metadata: { source: 'docs' } }, ], topK: 5, k1: 1.5, // Term frequency saturation (default: 1.5) b: 0.75 // Document length normalization (default: 0.75)}); const results = await bm25.retrieve('JavaScript runtime', 'any');// Finds documents with exact keyword matches for "JavaScript" and "runtime" // Add more documents dynamicallybm25.addDocuments([ { id: '4', content: 'Deno is a modern JavaScript/TypeScript runtime...' }]);# BM25 + Vector Search (Hybrid)
The most powerful retrieval strategy combines BM25 (keyword matching) with vector search (semantic understanding) using the EnsembleRetriever:
import { EnsembleRetriever } from 'orkajs/retrievers/ensemble';import { VectorRetriever } from 'orkajs/retrievers/vector';import { BM25Retriever } from 'orkajs/retrievers/bm25'; // Keyword-based retrievalconst bm25 = new BM25Retriever({ documents: myDocuments, topK: 10}); // Semantic retrievalconst vector = new VectorRetriever({ llm: orka.getLLM(), vectorDB: orka.knowledge['vectorDB'], topK: 10}); // Hybrid: combine both with Reciprocal Rank Fusionconst hybrid = new EnsembleRetriever({ retrievers: [bm25, vector], weights: [0.3, 0.7], // 30% keyword, 70% semantic topK: 5}); const results = await hybrid.retrieve('authentication middleware', 'docs');// Finds docs matching keywords AND semantically similar contentComparison
| Retriever | Improves | Trade-off |
|---|---|---|
| MultiQuery | Recall (finds more relevant docs) | Extra LLM calls |
| Compression | Precision (removes noise) | LLM calls for compression |
| Ensemble | Both recall & precision | Multiple retrieval passes |
| Vector | Speed (single pass) | May miss variations |
| ParentDocument | Context (full doc with precise search) | Requires parentId metadata |
| SelfQuery | Filtering (structured + semantic) | LLM call for query parsing |
| BM25 | Keyword matching (exact terms) | No semantic understanding |
Complete Example
import { createOrka, OpenAIAdapter, PineconeAdapter } from 'orkajs';import { MultiQueryRetriever } from 'orkajs/retrievers/multi-query';import { ContextualCompressionRetriever } from 'orkajs/retrievers/compression';import { EnsembleRetriever } from 'orkajs/retrievers/ensemble'; const orka = createOrka({ llm: new OpenAIAdapter({ apiKey: process.env.OPENAI_API_KEY! }), vectorDB: new PineconeAdapter({ /* config */ })}); // Strategy 1: Multi-query for better recallconst multiQuery = new MultiQueryRetriever({ llm: orka.getLLM(), vectorDB: orka.knowledge['vectorDB'], queryCount: 3, topK: 10}); // Strategy 2: Compression for better precisionconst compression = new ContextualCompressionRetriever({ llm: orka.getLLM(), vectorDB: orka.knowledge['vectorDB'], topK: 15, maxCompressedLength: 400}); // Combine both strategiesconst ensemble = new EnsembleRetriever({ retrievers: [multiQuery, compression], weights: [0.5, 0.5], topK: 5}); // Retrieve with best of both worldsconst results = await ensemble.retrieve( 'How does RAG improve LLM responses?', 'documentation'); console.log(`Found ${results.length} highly relevant results`);results.forEach(r => { console.log(`Score: ${r.score.toFixed(3)}`); console.log(`Content: ${r.content?.slice(0, 100)}...`);});Best Practices
1. Start Simple
Begin with VectorRetriever. Add MultiQuery if recall is low. Add Compression if precision is low.
2. Monitor Costs
MultiQuery and Compression make extra LLM calls. Use caching or limit query count in production.
3. Tune Weights
Experiment with ensemble weights based on your use case. Higher weight = more influence on final ranking.
Tree-shaking Imports
// ✅ Import only what you needimport { MultiQueryRetriever } from 'orkajs/retrievers/multi-query';import { EnsembleRetriever } from 'orkajs/retrievers/ensemble'; // ✅ Or import from indeximport { MultiQueryRetriever, ContextualCompressionRetriever } from 'orkajs/retrievers';