Multi-Model Orchestration
Route, race, balance, and combine multiple LLM providers intelligently.
Orka's orchestration layer lets you combine multiple LLM providers for optimal cost, latency, and quality.
ORKA — MULTI-MODEL ORCHESTRATION
RouterLLM
Route by condition
simple → claude-sonnet-4.5
complex → claude-opus-4.5
ConsensusLLM
Best of N models
claude-sonnet-4.5 → response A
claude-opus-4.5 → response B
judge → best_score
RaceLLM
Fastest wins
claude-opus-4.5 ✓ 120ms
claude-sonnet-4.5 cancelled
LoadBalancerLLM
Distribute load
round_robin | weighted | random
A: 50%
B: 50%
1. RouterLLM - Smart Routing
Route requests to different models based on complexity, content, or custom conditions:
router-llm.ts
import { createOrka, AnthropicAdapter, MemoryVectorAdapter, RouterLLM } from 'orkajs'; const fast = new AnthropicAdapter({ apiKey: process.env.ANTHROPIC_API_KEY!, model: 'claude-sonnet-4.5' });const smart = new AnthropicAdapter({ apiKey: process.env.ANTHROPIC_API_KEY!, model: 'claude-opus-4.5' }); const llm = new RouterLLM({ routes: [ { name: 'complex', condition: (prompt) => prompt.length > 500 || prompt.includes('analyse') || prompt.includes('compare'), adapter: smart, }, { name: 'code', condition: (prompt) => prompt.includes('code') || prompt.includes('function') || prompt.includes('```'), adapter: smart, }, ], defaultAdapter: fast, // Fallback for simple queries}); const orka = createOrka({ llm, vectorDB: new MemoryVectorAdapter() }); // Simple query → claude-sonnet-4.5const simple = await orka.generate('Say hello in one sentence.');console.log('Simple: ' + simple); // Complex query → claude-opus-4.5const complex = await orka.generate('Analyze the advantages and disadvantages of TypeScript vs JavaScript.');console.log('Complex: ' + complex);}2. ConsensusLLM - Best of N
Query multiple models and use a judge to select the best response:
consensus-llm.ts
import { ConsensusLLM, AnthropicAdapter, MemoryVectorAdapter, createOrka } from 'orkajs'; const llm = new ConsensusLLM({ adapters: [ new AnthropicAdapter({ apiKey: process.env.ANTHROPIC_API_KEY!, model: 'claude-sonnet-4.5' }), new AnthropicAdapter({ apiKey: process.env.ANTHROPIC_API_KEY!, model: 'claude-opus-4.5' }), ], strategy: 'best_score', // 'best_score' | 'majority_vote' | 'first_valid' judge: new AnthropicAdapter({ apiKey: process.env.ANTHROPIC_API_KEY!, model: 'claude-opus-4.5' }),}); const orka = createOrka({ llm, vectorDB: new MemoryVectorAdapter() }); // Both models generate results, and the judge chooses the best answer.const result = await orka.generate('Explique la récursion en programmation.');console.log('Consensus: ' + result.slice(0, 200) + '...');3. RaceLLM - Fastest Wins
Race multiple models and return the first response:
race-llm.ts
import { RaceLLM, AnthropicAdapter, MemoryVectorAdapter, createOrka } from 'orkajs'; const llm = new RaceLLM({ adapters: [ new AnthropicAdapter({ apiKey: process.env.ANTHROPIC_API_KEY!, model: 'claude-sonnet-4.5' }), new AnthropicAdapter({ apiKey: process.env.ANTHROPIC_API_KEY!, model: 'claude-opus-4.5' }), ], timeout: 10000, // Timeout global en ms}); const orka = createOrka({ llm, vectorDB: new MemoryVectorAdapter() }); // The first model to respond wins, the others are cancelledconst result = await orka.generate('What is TypeScript?');console.log('Race winner: ' + result.slice(0, 200) + '...');4. LoadBalancerLLM - Distribute Load
Distribute requests across multiple model instances:
load-balancer-llm.ts
import { LoadBalancerLLM, AnthropicAdapter, MemoryVectorAdapter, createOrka } from 'orkajs'; const lb = new LoadBalancerLLM({ adapters: [ new AnthropicAdapter({ apiKey: process.env.ANTHROPIC_API_KEY!, model: 'claude-sonnet-4.5' }), new AnthropicAdapter({ apiKey: process.env.ANTHROPIC_API_KEY!, model: 'claude-sonnet-4.5' }), ], strategy: 'round_robin', // 'round_robin' | 'weighted' | 'random'}); const orka = createOrka({ llm: lb, vectorDB: new MemoryVectorAdapter() }); // Requests are distributed in round-robinfor (let i = 0; i < 4; i++) { await orka.generate('Question ' + (i + 1));} // Usage statisticsconsole.log('Stats:', lb.getStats());// { adapter_0: { requests: 2, avgLatency: 150 }, adapter_1: { requests: 2, avgLatency: 145 } }5. Complete Example
multi-model-complete.ts
import { createOrka, AnthropicAdapter, MemoryVectorAdapter, RouterLLM, ConsensusLLM, RaceLLM, LoadBalancerLLM,} from 'orkajs'; async function routerExample() { console.log('🔀 Router: Route by complexity\n'); const fast = new AnthropicAdapter({ apiKey: process.env.ANTHROPIC_API_KEY!, model: 'claude-sonnet-4.5' }); const smart = new AnthropicAdapter({ apiKey: process.env.ANTHROPIC_API_KEY!, model: 'claude-opus-4.5' }); const llm = new RouterLLM({ routes: [ { name: 'complex', condition: (p) => p.length > 500 || p.includes('analyse'), adapter: smart }, { name: 'code', condition: (p) => p.includes('code') || p.includes('```'), adapter: smart }, ], defaultAdapter: fast, }); const orka = createOrka({ llm, vectorDB: new MemoryVectorAdapter() }); const simple = await orka.generate('Say hello.'); console.log('Simple (gpt-4o-mini): ' + simple + '');} async function consensusExample() { console.log('🤝 Consensus: Best answer of 2 models\n'); const llm = new ConsensusLLM({ adapters: [ new AnthropicAdapter({ apiKey: process.env.ANTHROPIC_API_KEY!, model: 'claude-sonnet-4.5' }), new AnthropicAdapter({ apiKey: process.env.ANTHROPIC_API_KEY!, model: 'claude-opus-4.5' }), ], strategy: 'best_score', judge: new AnthropicAdapter({ apiKey: process.env.ANTHROPIC_API_KEY!, model: 'claude-opus-4.5' }), }); const orka = createOrka({ llm, vectorDB: new MemoryVectorAdapter() }); const result = await orka.generate('Explain recursion.'); console.log('Consensus: ' + result.slice(0, 200) + '...');} async function raceExample() { console.log('🏎️ Race: The fastest wins\n'); const llm = new RaceLLM({ adapters: [ new AnthropicAdapter({ apiKey: process.env.ANTHROPIC_API_KEY!, model: 'claude-sonnet-4.5' }), new AnthropicAdapter({ apiKey: process.env.ANTHROPIC_API_KEY!, model: 'claude-opus-4.5' }), ], timeout: 10000, }); const orka = createOrka({ llm, vectorDB: new MemoryVectorAdapter() }); const result = await orka.generate('What is TypeScript?'); console.log('Race winner: ' + result.slice(0, 200) + '...');} async function loadBalancerExample() { console.log('⚖️ Load Balancer: Round-robin\n'); const lb = new LoadBalancerLLM({ adapters: [ new AnthropicAdapter({ apiKey: process.env.ANTHROPIC_API_KEY!, model: 'claude-sonnet-4.5' }), new AnthropicAdapter({ apiKey: process.env.ANTHROPIC_API_KEY!, model: 'claude-sonnet-4.5' }), ], strategy: 'round_robin', }); const orka = createOrka({ llm: lb, vectorDB: new MemoryVectorAdapter() }); for (let i = 0; i < 4; i++) { await orka.generate('Question ' + (i + 1)); } console.log('Stats:', lb.getStats());} async function main() { await routerExample(); await consensusExample(); await raceExample(); await loadBalancerExample();} main().catch(console.error);When to Use Each Strategy
🔀 RouterLLM
Cost optimization: use cheap models for simple queries, expensive for complex.
🤝 ConsensusLLM
Quality critical: get the best answer by comparing multiple models.
🏎️ RaceLLM
Latency critical: get the fastest response regardless of which model.
⚖️ LoadBalancerLLM
High throughput: distribute load across multiple API keys or instances.