Multi-Model Orchestration

Route, race, and balance between multiple providers to build model consensus andprovider-agnostic workflows.

RouterLLM

Route requests to different models based on conditions

new RouterLLM({
  routes: [
    { condition: (p) => p.length > 500, adapter: gpt4o },
    { condition: (p) => p.includes('code'), adapter: gpt4o },
  ],
  defaultAdapter: gpt4oMini,
})

ConsensusLLM

Query multiple models and select the best response

new ConsensusLLM({
  adapters: [gpt4oMini, gpt4o],
  strategy: 'best_score', // 'majority' | 'merge'
  judge: gpt4o,
})

RaceLLM

Query multiple models in parallel, return the fastest

new RaceLLM({
  adapters: [openai, anthropic],
  timeout: 10000,
})

LoadBalancerLLM

Distribute requests across multiple adapters

new LoadBalancerLLM({
  adapters: [key1, key2, key3],
  strategy: 'round_robin', // 'random' | 'least_tokens'
})

Combining Strategies

// Load-balanced pool for simple requests
const cheapPool = new LoadBalancerLLM({
  adapters: [miniAdapter1, miniAdapter2],
  strategy: 'round_robin',
});
 
// Fallback chain for complex requests
const powerfulChain = new FallbackLLM({
  adapters: [gpt4oAdapter, claudeAdapter],
});
 
// Router that picks the right strategy
const llm = new RouterLLM({
  routes: [
    { condition: (p) => p.length > 1000, adapter: powerfulChain },
  ],
  defaultAdapter: cheapPool,
});

💡 Pro Tip

All orchestration adapters implement LLMAdapter, so they can be used as the llm parameter in createOrka(), inside FallbackLLM, or anywhere an adapter is expected.

# RouterLLM — Intelligent Request Routing

RouterLLM evaluates each prompt against a list of conditions and routes to the appropriate model. Conditions are evaluated in order — the first matching condition wins. If no condition matches, the defaultAdapter is used.

import { RouterLLM } from '@orka-js/orchestration';
import { OpenAIAdapter } from '@orka-js/openai';
 
const gpt4o = new OpenAIAdapter({ apiKey: '...', model: 'gpt-4o' });
const gpt4oMini = new OpenAIAdapter({ apiKey: '...', model: 'gpt-4o-mini' });
const claude = new AnthropicAdapter({ apiKey: '...', model: 'claude-3-sonnet' });
 
const router = new RouterLLM({
  routes: [
    // Route long prompts to GPT-4o (better at complex reasoning)
    { condition: (prompt) => prompt.length > 2000, adapter: gpt4o },
 
    // Route code-related prompts to Claude (strong at code)
    { condition: (prompt) => /(code|function|class|import)/i.test(prompt), adapter: claude },
 
    // Route math/calculation prompts to GPT-4o
    { condition: (prompt) => /(calculate|compute|math|equation)/i.test(prompt), adapter: gpt4o },
  ],
  defaultAdapter: gpt4oMini, // Fast and cheap for simple queries
});
 
// Usage
const result = await router.generate('Write a Python function to sort a list');
// → Routed to Claude (matches 'function' keyword)

RouterLLM Parameters

routes: Route[]

Array of { condition: (prompt) => boolean, adapter: LLMAdapter }. Evaluated in order.

defaultAdapter: LLMAdapter

Fallback adapter when no route condition matches.

# ConsensusLLM — Multi-Model Agreement

ConsensusLLM queries multiple models and combines their responses using one of three strategies: best_score (a judge model picks the best), majority (most common answer wins), or merge (combine all responses into one).

import { ConsensusLLM } from '@orka-js/orchestration';
 
// Strategy 1: best_score — Judge picks the best response
const consensus = new ConsensusLLM({
  adapters: [gpt4oMini, claude, mistral],
  strategy: 'best_score',
  judge: gpt4o, // A more powerful model evaluates responses
  judgePrompt: 'Rate these responses 1-10 for accuracy and helpfulness. Return the best one.',
});
 
// Strategy 2: majority — Most common answer wins (good for factual questions)
const majorityConsensus = new ConsensusLLM({
  adapters: [gpt4oMini, claude, mistral],
  strategy: 'majority',
});
 
// Strategy 3: merge — Combine all responses (good for creative tasks)
const mergeConsensus = new ConsensusLLM({
  adapters: [gpt4oMini, claude],
  strategy: 'merge',
  mergePrompt: 'Combine these responses into a comprehensive answer.',
});
 
const result = await consensus.generate('What is the capital of France?');
// All 3 models are queried in parallel, then judge picks the best answer

ConsensusLLM Parameters

adapters: LLMAdapter[]

Array of models to query. All are called in parallel.

strategy: 'best_score' | 'majority' | 'merge'

How to select/combine responses. Default: best_score.

judge?: LLMAdapter

Required for 'best_score'. The model that evaluates and picks the best response.

# RaceLLM — Fastest Response Wins

RaceLLM queries multiple models in parallel and returns the first successful response. Ideal for latency-sensitive applications where you want the fastest available model.

import { RaceLLM } from '@orka-js/orchestration';
 
const race = new RaceLLM({
  adapters: [openai, anthropic, mistral],
  timeout: 10000, // 10s max wait time
});
 
const result = await race.generate('Quick question: what is 2+2?');
// Returns the first model to respond
// Other requests are cancelled (if supported by the adapter)
 
console.log(result.metadata?.adapter); // Which adapter won the race

RaceLLM Parameters

adapters: LLMAdapter[]

Array of models to race. All are called simultaneously.

timeout?: number

Maximum time to wait for any response (ms). Default: 30000.

# LoadBalancerLLM — Distribute Load

LoadBalancerLLM distributes requests across multiple adapters (often the same model with different API keys) to avoid rate limits and improve throughput.

import { LoadBalancerLLM } from '@orka-js/orchestration';
 
// Multiple API keys for the same model
const key1 = new OpenAIAdapter({ apiKey: process.env.OPENAI_KEY_1! });
const key2 = new OpenAIAdapter({ apiKey: process.env.OPENAI_KEY_2! });
const key3 = new OpenAIAdapter({ apiKey: process.env.OPENAI_KEY_3! });
 
const balancer = new LoadBalancerLLM({
  adapters: [key1, key2, key3],
  strategy: 'round_robin', // Rotate through adapters
});
 
// Strategies:
// - 'round_robin': Cycle through adapters in order (1, 2, 3, 1, 2, 3...)
// - 'random': Pick a random adapter each time
// - 'least_tokens': Pick the adapter that has used the fewest tokens
 
// Make 100 requests — they'll be distributed across all 3 keys
for (let i = 0; i < 100; i++) {
  await balancer.generate('Hello!');
}

LoadBalancerLLM Parameters

adapters: LLMAdapter[]

Array of adapters to balance load across.

strategy: 'round_robin' | 'random' | 'least_tokens'

How to select the next adapter. Default: round_robin.

Comparison

Orchestration Logic	Primary Strategic Goal	Execution Profile
Engine RouterLLM	Cost efficiency & specialized routing	Calls:`1` Lat:Low (1x)
Engine ConsensusLLM	Maximum accuracy via multi-judge	Calls:`N + 1` Lat:High (Sync)
Engine RaceLLM	Minimum latency (First-to-finish)	Calls:`N (Async)` Lat:Fastest
Engine LoadBalancerLLM	High throughput & rate-limit safety	Calls:`1` Lat:Stable

Tree-shaking Imports

Import only what you need to minimize bundle size:

// ✅ Import only what you need
import { RouterLLM } from '@orka-js/orchestration';
import { ConsensusLLM } from '@orka-js/orchestration';
import { RaceLLM } from '@orka-js/orchestration';
import { LoadBalancerLLM } from '@orka-js/orchestration';
 
// ✅ Or import from index
import { RouterLLM, ConsensusLLM, RaceLLM, LoadBalancerLLM } from '@orka-js/orchestration';