Advanced Retrievers

Improve retrieval quality with multi-query expansion, contextual compression, and ensemble methods.

Why Advanced Retrievers?

Basic vector search can miss relevant documents due to query phrasing or semantic gaps. Advanced retrievers improve recall and precision through query expansion, result compression, and fusion techniques.

# MultiQueryRetriever

Generates multiple alternative queries using an LLM, retrieves results for each, and deduplicates to improve recall.

import { MultiQueryRetriever } from '@orka-js/tools';
import { createOrka } from '@orka-js/core';
import { OpenAIAdapter } from '@orka-js/openai';
import { PineconeAdapter } from '@orka-js/pinecone';
 
const orka = createOrka({
  llm: new OpenAIAdapter({ apiKey: process.env.OPENAI_API_KEY! }),
  vectorDB: new PineconeAdapter({ /* config */ })
});
 
const retriever = new MultiQueryRetriever({
  llm: orka.getLLM(),
  vectorDB: orka.knowledge['vectorDB'],
  queryCount: 3,              // Generate 3 alternative queries
  topK: 5,                    // Return top 5 results per query
  deduplicateByContent: true  // Remove duplicate results
});
 
const results = await retriever.retrieve(
  'How do I configure Orka JS?',
  'my-knowledge-base'
);
 
// Returns deduplicated results from all query variations

Retrieval Augmentation Logic

Solving Semantic Mismatch via LLM Diversification

Origin Query

How do I configure Orka JS?

LLM Variation Engine

Configuration options

Setup guide

Settings walkthrough

Fusion & Ranking

DeduplicationRank FusionReciprocal Scoring

# ContextualCompressionRetriever

Retrieves more documents than needed, then uses an LLM to extract only the relevant parts, improving precision and reducing context size.

import { ContextualCompressionRetriever } from '@orka-js/tools';
 
const retriever = new ContextualCompressionRetriever({
  llm: orka.getLLM(),
  vectorDB: orka.knowledge['vectorDB'],
  topK: 10,                    // Retrieve 10 documents
  maxCompressedLength: 500     // Compress each to ~500 chars
});
 
const results = await retriever.retrieve(
  'What are the benefits of RAG?',
  'my-knowledge-base'
);
 
// Each result contains only the relevant extract, not the full document

Standard Retrieval

Full Document Bloat

Wastes 70-80% of the context window with 'noise' (headers, irrelevant paragraphs, footers).

Compressed Output

High Signal Density

Returns only the 'golden nuggets'. Reduces token cost and dramatically improves LLM accuracy.

Efficiency Gain+400% Tokens Optimized

# EnsembleRetriever

Combines multiple retrievers using Reciprocal Rank Fusion (RRF) for improved results. Useful for combining different retrieval strategies.

import { EnsembleRetriever } from '@orka-js/tools';
import { VectorRetriever } from '@orka-js/tools';
import { MultiQueryRetriever } from '@orka-js/tools';
 
// Create individual retrievers
const vectorRetriever = new VectorRetriever({
  llm: orka.getLLM(),
  vectorDB: orka.knowledge['vectorDB'],
  topK: 10
});
 
const multiQueryRetriever = new MultiQueryRetriever({
  llm: orka.getLLM(),
  vectorDB: orka.knowledge['vectorDB'],
  queryCount: 3,
  topK: 10
});
 
// Combine with weighted fusion
const ensemble = new EnsembleRetriever({
  retrievers: [vectorRetriever, multiQueryRetriever],
  weights: [0.4, 0.6],  // 40% vector, 60% multi-query
  topK: 5               // Return top 5 fused results
});
 
const results = await ensemble.retrieve(
  'Explain RAG architecture',
  'my-knowledge-base'
);

🔬 Reciprocal Rank Fusion (RRF)

RRF combines rankings from multiple sources by giving higher scores to documents that appear in top positions across multiple retrievers.

// Formula: score = weight * (1 / (rank + 60)}
// Document at rank 1 in Retriever A: 0.4 * (1/61) = 0.0066
// Same document at rank 3 in Retriever B: 0.6 * (1/63) = 0.0095
// Final fusion score: 0.0066 + 0.0095 = 0.0161

# VectorRetriever

Basic vector search wrapper that implements the Retriever interface. Useful as a building block for ensemble retrievers.

import { VectorRetriever } from '@orka-js/tools';
 
const retriever = new VectorRetriever({
  llm: orka.getLLM(),
  vectorDB: orka.knowledge['vectorDB'],
  topK: 5,
  minScore: 0.7  // Filter results below 0.7 similarity
});
 
const results = await retriever.retrieve(
  'What is RAG?',
  'my-knowledge-base'
);

# ParentDocumentRetriever

Searches on small child chunks for precision, then returns the full parent document for context. This solves the classic trade-off: small chunks are better for search accuracy, but large chunks provide more context for the LLM.

import { ParentDocumentRetriever } from '@orka-js/tools';
 
const retriever = new ParentDocumentRetriever({
  llm: orka.getLLM(),
  vectorDB: orka.knowledge['vectorDB'],
  childTopK: 10,   // Search top 10 child chunks
  parentTopK: 3,   // Return top 3 parent documents
  minScore: 0.6
});
const results = await retriever.retrieve(
  'How does authentication work?',
  'documentation'
);
 
// Returns full parent documents, ranked by best child chunk score
// Each result includes metadata: { childCount, parentContent, ... }

Granular Indexing

Split documents into small snippets. Each snippet stores its Parent ID in metadata for future reconstruction.

Step 1: IndexSmall Chunks

Vector Retrieval

Perform semantic search on small chunks to find the exact match without noise or dilution.

Step 2: SearchChild Search

Parent Association

Consolidate the found child chunks by their Parent ID. Groups fragmented insights into logical units.

Step 3: GroupMetadata Join

Context Expansion

Return the full parent content of the best-ranked child, providing the LLM with complete context.

Step 4: ExpandFull Retrieval

# SelfQueryRetriever

Uses an LLM to automatically extract metadata filters from natural language queries. Instead of just semantic search, it combines meaning-based search with structured metadata filtering for more precise results.

import { SelfQueryRetriever } from '@orka-js/tools';
 
const retriever = new SelfQueryRetriever({
  llm: orka.getLLM(),
  vectorDB: orka.knowledge['vectorDB'],
  topK: 5,
  metadataFields: [
    {
      name: 'category',
      type: 'string',
      description: 'The document category',
      enumValues: ['tutorial', 'api-reference', 'guide', 'changelog']
    },
    {
      name: 'language',
      type: 'string',
      description: 'Programming language',
      enumValues: ['typescript', 'python', 'javascript']
    },
    {
      name: 'version',
      type: 'number',
      description: 'The version number of the documentation'
    }
  ]
});
 
// Natural language query with implicit filters
const results = await retriever.retrieve(
  'Show me TypeScript tutorials about authentication in version 3',
  'documentation'
);
 
// The LLM extracts:
// semanticQuery: "authentication"
// filter: { language: "typescript", category: "tutorial", version: 3 }

Query Decomposition Example

The LLM automatically separates the semantic meaning from the structured filters:

// User query: "Find Python guides about deployment from 2024"
// LLM extracts:
{
  "semanticQuery": "deployment",
  "filter": {
    "language": "python",
    "category": "guide"
  }
}

# BM25Retriever

A keyword-based retriever using the BM25 (Best Matching 25) algorithm. Unlike vector search which relies on semantic similarity, BM25 uses term frequency and inverse document frequency for exact keyword matching. Perfect for combining with vector search in an EnsembleRetriever.

import { BM25Retriever } from '@orka-js/tools';
 
const bm25 = new BM25Retriever({
  documents: [
    { id: '1', content: 'TypeScript is a typed superset of JavaScript...', metadata: { source: 'docs' } },
    { id: '2', content: 'React hooks allow you to use state in functional components...', metadata: { source: 'blog' } },
    { id: '3', content: 'Node.js is a JavaScript runtime built on Chrome V8...', metadata: { source: 'docs' } },
  ],
  topK: 5,
  k1: 1.5,  // Term frequency saturation (default: 1.5)
  b: 0.75   // Document length normalization (default: 0.75)
});
 
const results = await bm25.retrieve('JavaScript runtime', 'any');
// Finds documents with exact keyword matches for "JavaScript" and "runtime"
 
// Add more documents dynamically
bm25.addDocuments([
  { id: '4', content: 'Deno is a modern JavaScript/TypeScript runtime...' }
]);

# BM25 + Vector Search (Hybrid)

The most powerful retrieval strategy combines BM25 (keyword matching) with vector search (semantic understanding) using the EnsembleRetriever:

import { EnsembleRetriever } from '@orka-js/tools';
import { VectorRetriever } from '@orka-js/tools';
import { BM25Retriever } from '@orka-js/tools';
 
// Keyword-based retrieval
const bm25 = new BM25Retriever({
  documents: myDocuments,
  topK: 10
});
 
// Semantic retrieval
const vector = new VectorRetriever({
  llm: orka.getLLM(),
  vectorDB: orka.knowledge['vectorDB'],
  topK: 10
});
 
// Hybrid: combine both with Reciprocal Rank Fusion
const hybrid = new EnsembleRetriever({
  retrievers: [bm25, vector],
  weights: [0.3, 0.7],  // 30% keyword, 70% semantic
  topK: 5
});
 
const results = await hybrid.retrieve('authentication middleware', 'docs');
// Finds docs matching keywords AND semantically similar content

Comparison

Retriever Strategy	Core Strength	Architectural Trade-off
`MultiQuery`Creative	Recall Max	High Latency / Tokens
`Compression`Clean	Ultra-Precision	LLM Overhead
`Ensemble`Robust	Hybrid Power	Multi-pass processing
`Vector`Baseline	Latency < 50ms	Semantic Drift
`ParentDoc`Context	Rich Context	Storage Complexity
`SelfQuery`Logic	Smart Filter	Schema Dependency
`BM25`Classic	Exact Match	No Semantics

Complete Example

import { createOrka } from '@orka-js/core';
import { OpenAIAdapter } from '@orka-js/adapters';
import { PineconeAdapter } from '@orka-js/adapters';
import { MultiQueryRetriever, ContextualCompressionRetriever, EnsembleRetriever } from '@orka-js/tools';
 
const orka = createOrka({
  llm: new OpenAIAdapter({ apiKey: process.env.OPENAI_API_KEY! }),
  vectorDB: new PineconeAdapter({ /* config */ })
});
 
// Strategy 1: Multi-query for better recall
const multiQuery = new MultiQueryRetriever({
  llm: orka.getLLM(),
  vectorDB: orka.knowledge['vectorDB'],
  queryCount: 3,
  topK: 10
});
 
// Strategy 2: Compression for better precision
const compression = new ContextualCompressionRetriever({
  llm: orka.getLLM(),
  vectorDB: orka.knowledge['vectorDB'],
  topK: 15,
  maxCompressedLength: 400
});
 
// Combine both strategies
const ensemble = new EnsembleRetriever({
  retrievers: [multiQuery, compression],
  weights: [0.5, 0.5],
  topK: 5
});
 
// Retrieve with best of both worlds
const results = await ensemble.retrieve(
  'How does RAG improve LLM responses?',
  'documentation'
);
 
console.log(`Found ${results.length} highly relevant results`);
results.forEach(r => {
  console.log(`Score: ${r.score.toFixed(3)}`);
  console.log(`Content: ${r.content?.slice(0, 100)}...`);
});

Best Practices

1. Start Simple

Begin with VectorRetriever. Add MultiQuery if recall is low. Add Compression if precision is low.

2. Monitor Costs

MultiQuery and Compression make extra LLM calls. Use caching or limit query count in production.

3. Tune Weights

Experiment with ensemble weights based on your use case. Higher weight = more influence on final ranking.

Tree-shaking Imports

// ✅ Import only what you need
import { MultiQueryRetriever } from '@orka-js/tools';
import { EnsembleRetriever } from '@orka-js/tools';
 
// ✅ Or import from index
import { MultiQueryRetriever, ContextualCompressionRetriever } from '@orka-js/tools';