OrkaJS
Orka.JS

Multi-Model Orchestration

Route, race, balance, and build consensus between multiple LLM providers with Orka AI.

RouterLLM

Route requests to different models based on conditions

new RouterLLM({
routes: [
{ condition: (p) => p.length > 500, adapter: gpt4o },
{ condition: (p) => p.includes('code'), adapter: gpt4o },
],
defaultAdapter: gpt4oMini,
})

ConsensusLLM

Query multiple models and select the best response

new ConsensusLLM({
adapters: [gpt4oMini, gpt4o],
strategy: 'best_score', // 'majority' | 'merge'
judge: gpt4o,
})

RaceLLM

Query multiple models in parallel, return the fastest

new RaceLLM({
adapters: [openai, anthropic],
timeout: 10000,
})

LoadBalancerLLM

Distribute requests across multiple adapters

new LoadBalancerLLM({
adapters: [key1, key2, key3],
strategy: 'round_robin', // 'random' | 'least_tokens'
})

# Combining Strategies

// Load-balanced pool for simple requests
const cheapPool = new LoadBalancerLLM({
adapters: [miniAdapter1, miniAdapter2],
strategy: 'round_robin',
});
 
// Fallback chain for complex requests
const powerfulChain = new FallbackLLM({
adapters: [gpt4oAdapter, claudeAdapter],
});
 
// Router that picks the right strategy
const llm = new RouterLLM({
routes: [
{ condition: (p) => p.length > 1000, adapter: powerfulChain },
],
defaultAdapter: cheapPool,
});

💡 Pro Tip

All orchestration adapters implement LLMAdapter, so they can be used as the llm parameter in createOrka(), inside FallbackLLM, or anywhere an adapter is expected.

# RouterLLM — Intelligent Request Routing

RouterLLM evaluates each prompt against a list of conditions and routes to the appropriate model. Conditions are evaluated in order — the first matching condition wins. If no condition matches, the defaultAdapter is used.

import { RouterLLM } from 'orkajs/orchestration';
import { OpenAIAdapter } from 'orkajs/adapters/openai';
 
const gpt4o = new OpenAIAdapter({ apiKey: '...', model: 'gpt-4o' });
const gpt4oMini = new OpenAIAdapter({ apiKey: '...', model: 'gpt-4o-mini' });
const claude = new AnthropicAdapter({ apiKey: '...', model: 'claude-3-sonnet' });
 
const router = new RouterLLM({
routes: [
// Route long prompts to GPT-4o (better at complex reasoning)
{ condition: (prompt) => prompt.length > 2000, adapter: gpt4o },
 
// Route code-related prompts to Claude (strong at code)
{ condition: (prompt) => /(code|function|class|import)/i.test(prompt), adapter: claude },
 
// Route math/calculation prompts to GPT-4o
{ condition: (prompt) => /(calculate|compute|math|equation)/i.test(prompt), adapter: gpt4o },
],
defaultAdapter: gpt4oMini, // Fast and cheap for simple queries
});
 
// Usage
const result = await router.generate('Write a Python function to sort a list');
// → Routed to Claude (matches 'function' keyword)

RouterLLM Parameters

routes: Route[]

Array of { condition: (prompt) => boolean, adapter: LLMAdapter }. Evaluated in order.

defaultAdapter: LLMAdapter

Fallback adapter when no route condition matches.

# ConsensusLLM — Multi-Model Agreement

ConsensusLLM queries multiple models and combines their responses using one of three strategies: best_score (a judge model picks the best), majority (most common answer wins), or merge (combine all responses into one).

import { ConsensusLLM } from 'orkajs/orchestration';
 
// Strategy 1: best_score — Judge picks the best response
const consensus = new ConsensusLLM({
adapters: [gpt4oMini, claude, mistral],
strategy: 'best_score',
judge: gpt4o, // A more powerful model evaluates responses
judgePrompt: 'Rate these responses 1-10 for accuracy and helpfulness. Return the best one.',
});
 
// Strategy 2: majority — Most common answer wins (good for factual questions)
const majorityConsensus = new ConsensusLLM({
adapters: [gpt4oMini, claude, mistral],
strategy: 'majority',
});
 
// Strategy 3: merge — Combine all responses (good for creative tasks)
const mergeConsensus = new ConsensusLLM({
adapters: [gpt4oMini, claude],
strategy: 'merge',
mergePrompt: 'Combine these responses into a comprehensive answer.',
});
 
const result = await consensus.generate('What is the capital of France?');
// All 3 models are queried in parallel, then judge picks the best answer

ConsensusLLM Parameters

adapters: LLMAdapter[]

Array of models to query. All are called in parallel.

strategy: 'best_score' | 'majority' | 'merge'

How to select/combine responses. Default: best_score.

judge?: LLMAdapter

Required for 'best_score'. The model that evaluates and picks the best response.

# RaceLLM — Fastest Response Wins

RaceLLM queries multiple models in parallel and returns the first successful response. Ideal for latency-sensitive applications where you want the fastest available model.

import { RaceLLM } from 'orkajs/orchestration';
 
const race = new RaceLLM({
adapters: [openai, anthropic, mistral],
timeout: 10000, // 10s max wait time
});
 
const result = await race.generate('Quick question: what is 2+2?');
// Returns the first model to respond
// Other requests are cancelled (if supported by the adapter)
 
console.log(result.metadata?.adapter); // Which adapter won the race

RaceLLM Parameters

adapters: LLMAdapter[]

Array of models to race. All are called simultaneously.

timeout?: number

Maximum time to wait for any response (ms). Default: 30000.

# LoadBalancerLLM — Distribute Load

LoadBalancerLLM distributes requests across multiple adapters (often the same model with different API keys) to avoid rate limits and improve throughput.

import { LoadBalancerLLM } from 'orkajs/orchestration';
 
// Multiple API keys for the same model
const key1 = new OpenAIAdapter({ apiKey: process.env.OPENAI_KEY_1! });
const key2 = new OpenAIAdapter({ apiKey: process.env.OPENAI_KEY_2! });
const key3 = new OpenAIAdapter({ apiKey: process.env.OPENAI_KEY_3! });
 
const balancer = new LoadBalancerLLM({
adapters: [key1, key2, key3],
strategy: 'round_robin', // Rotate through adapters
});
 
// Strategies:
// - 'round_robin': Cycle through adapters in order (1, 2, 3, 1, 2, 3...)
// - 'random': Pick a random adapter each time
// - 'least_tokens': Pick the adapter that has used the fewest tokens
 
// Make 100 requests — they'll be distributed across all 3 keys
for (let i = 0; i < 100; i++) {
await balancer.generate('Hello!');
}

LoadBalancerLLM Parameters

adapters: LLMAdapter[]

Array of adapters to balance load across.

strategy: 'round_robin' | 'random' | 'least_tokens'

How to select the next adapter. Default: round_robin.

# Comparison

OrchestratorUse CaseAPI CallsLatency
RouterLLMCost optimization, task-specific routing1Single model
ConsensusLLMHigh accuracy, critical decisionsN + 1 (judge)Parallel + judge
RaceLLMMinimum latencyN (parallel)Fastest model
LoadBalancerLLMRate limit avoidance, high throughput1Single model

Tree-shaking Imports

Import only what you need to minimize bundle size:

// ✅ Import only what you need
import { RouterLLM } from 'orkajs/orchestration/router';
import { ConsensusLLM } from 'orkajs/orchestration/consensus';
import { RaceLLM } from 'orkajs/orchestration/race';
import { LoadBalancerLLM } from 'orkajs/orchestration/load-balancer';
 
// ✅ Or import from index
import { RouterLLM, ConsensusLLM, RaceLLM, LoadBalancerLLM } from 'orkajs/orchestration';