Multi-Model Orchestration
Route, race, balance, and build consensus between multiple LLM providers with Orka AI.
RouterLLM
Route requests to different models based on conditions
new RouterLLM({ routes: [ { condition: (p) => p.length > 500, adapter: gpt4o }, { condition: (p) => p.includes('code'), adapter: gpt4o }, ], defaultAdapter: gpt4oMini,})ConsensusLLM
Query multiple models and select the best response
new ConsensusLLM({ adapters: [gpt4oMini, gpt4o], strategy: 'best_score', // 'majority' | 'merge' judge: gpt4o,})RaceLLM
Query multiple models in parallel, return the fastest
new RaceLLM({ adapters: [openai, anthropic], timeout: 10000,})LoadBalancerLLM
Distribute requests across multiple adapters
new LoadBalancerLLM({ adapters: [key1, key2, key3], strategy: 'round_robin', // 'random' | 'least_tokens'})# Combining Strategies
// Load-balanced pool for simple requestsconst cheapPool = new LoadBalancerLLM({ adapters: [miniAdapter1, miniAdapter2], strategy: 'round_robin',}); // Fallback chain for complex requestsconst powerfulChain = new FallbackLLM({ adapters: [gpt4oAdapter, claudeAdapter],}); // Router that picks the right strategyconst llm = new RouterLLM({ routes: [ { condition: (p) => p.length > 1000, adapter: powerfulChain }, ], defaultAdapter: cheapPool,});💡 Pro Tip
All orchestration adapters implement LLMAdapter, so they can be used as the llm parameter in createOrka(), inside FallbackLLM, or anywhere an adapter is expected.
# RouterLLM — Intelligent Request Routing
RouterLLM evaluates each prompt against a list of conditions and routes to the appropriate model. Conditions are evaluated in order — the first matching condition wins. If no condition matches, the defaultAdapter is used.
import { RouterLLM } from 'orkajs/orchestration';import { OpenAIAdapter } from 'orkajs/adapters/openai'; const gpt4o = new OpenAIAdapter({ apiKey: '...', model: 'gpt-4o' });const gpt4oMini = new OpenAIAdapter({ apiKey: '...', model: 'gpt-4o-mini' });const claude = new AnthropicAdapter({ apiKey: '...', model: 'claude-3-sonnet' }); const router = new RouterLLM({ routes: [ // Route long prompts to GPT-4o (better at complex reasoning) { condition: (prompt) => prompt.length > 2000, adapter: gpt4o }, // Route code-related prompts to Claude (strong at code) { condition: (prompt) => /(code|function|class|import)/i.test(prompt), adapter: claude }, // Route math/calculation prompts to GPT-4o { condition: (prompt) => /(calculate|compute|math|equation)/i.test(prompt), adapter: gpt4o }, ], defaultAdapter: gpt4oMini, // Fast and cheap for simple queries}); // Usageconst result = await router.generate('Write a Python function to sort a list');// → Routed to Claude (matches 'function' keyword)RouterLLM Parameters
routes: Route[]Array of { condition: (prompt) => boolean, adapter: LLMAdapter }. Evaluated in order.
defaultAdapter: LLMAdapterFallback adapter when no route condition matches.
# ConsensusLLM — Multi-Model Agreement
ConsensusLLM queries multiple models and combines their responses using one of three strategies: best_score (a judge model picks the best), majority (most common answer wins), or merge (combine all responses into one).
import { ConsensusLLM } from 'orkajs/orchestration'; // Strategy 1: best_score — Judge picks the best responseconst consensus = new ConsensusLLM({ adapters: [gpt4oMini, claude, mistral], strategy: 'best_score', judge: gpt4o, // A more powerful model evaluates responses judgePrompt: 'Rate these responses 1-10 for accuracy and helpfulness. Return the best one.',}); // Strategy 2: majority — Most common answer wins (good for factual questions)const majorityConsensus = new ConsensusLLM({ adapters: [gpt4oMini, claude, mistral], strategy: 'majority',}); // Strategy 3: merge — Combine all responses (good for creative tasks)const mergeConsensus = new ConsensusLLM({ adapters: [gpt4oMini, claude], strategy: 'merge', mergePrompt: 'Combine these responses into a comprehensive answer.',}); const result = await consensus.generate('What is the capital of France?');// All 3 models are queried in parallel, then judge picks the best answerConsensusLLM Parameters
adapters: LLMAdapter[]Array of models to query. All are called in parallel.
strategy: 'best_score' | 'majority' | 'merge'How to select/combine responses. Default: best_score.
judge?: LLMAdapterRequired for 'best_score'. The model that evaluates and picks the best response.
# RaceLLM — Fastest Response Wins
RaceLLM queries multiple models in parallel and returns the first successful response. Ideal for latency-sensitive applications where you want the fastest available model.
import { RaceLLM } from 'orkajs/orchestration'; const race = new RaceLLM({ adapters: [openai, anthropic, mistral], timeout: 10000, // 10s max wait time}); const result = await race.generate('Quick question: what is 2+2?');// Returns the first model to respond// Other requests are cancelled (if supported by the adapter) console.log(result.metadata?.adapter); // Which adapter won the raceRaceLLM Parameters
adapters: LLMAdapter[]Array of models to race. All are called simultaneously.
timeout?: numberMaximum time to wait for any response (ms). Default: 30000.
# LoadBalancerLLM — Distribute Load
LoadBalancerLLM distributes requests across multiple adapters (often the same model with different API keys) to avoid rate limits and improve throughput.
import { LoadBalancerLLM } from 'orkajs/orchestration'; // Multiple API keys for the same modelconst key1 = new OpenAIAdapter({ apiKey: process.env.OPENAI_KEY_1! });const key2 = new OpenAIAdapter({ apiKey: process.env.OPENAI_KEY_2! });const key3 = new OpenAIAdapter({ apiKey: process.env.OPENAI_KEY_3! }); const balancer = new LoadBalancerLLM({ adapters: [key1, key2, key3], strategy: 'round_robin', // Rotate through adapters}); // Strategies:// - 'round_robin': Cycle through adapters in order (1, 2, 3, 1, 2, 3...)// - 'random': Pick a random adapter each time// - 'least_tokens': Pick the adapter that has used the fewest tokens // Make 100 requests — they'll be distributed across all 3 keysfor (let i = 0; i < 100; i++) { await balancer.generate('Hello!');}LoadBalancerLLM Parameters
adapters: LLMAdapter[]Array of adapters to balance load across.
strategy: 'round_robin' | 'random' | 'least_tokens'How to select the next adapter. Default: round_robin.
# Comparison
| Orchestrator | Use Case | API Calls | Latency |
|---|---|---|---|
| RouterLLM | Cost optimization, task-specific routing | 1 | Single model |
| ConsensusLLM | High accuracy, critical decisions | N + 1 (judge) | Parallel + judge |
| RaceLLM | Minimum latency | N (parallel) | Fastest model |
| LoadBalancerLLM | Rate limit avoidance, high throughput | 1 | Single model |
Tree-shaking Imports
Import only what you need to minimize bundle size:
// ✅ Import only what you needimport { RouterLLM } from 'orkajs/orchestration/router';import { ConsensusLLM } from 'orkajs/orchestration/consensus';import { RaceLLM } from 'orkajs/orchestration/race';import { LoadBalancerLLM } from 'orkajs/orchestration/load-balancer'; // ✅ Or import from indeximport { RouterLLM, ConsensusLLM, RaceLLM, LoadBalancerLLM } from 'orkajs/orchestration';