Orchestration Multi-Modèle
Routez, faites concourir, équilibrez et combinez plusieurs fournisseurs LLM intelligemment.
La couche d'orchestration d'Orka vous permet de combiner plusieurs fournisseurs LLM pour un coût, une latence et une qualité optimaux.
1. RouterLLM - Routage Intelligent
Routez les requêtes vers différents modèles selon la complexité, le contenu ou des conditions personnalisées :
import { createOrka, AnthropicAdapter, MemoryVectorAdapter, RouterLLM } from 'orkajs'; const fast = new AnthropicAdapter({ apiKey: process.env.ANTHROPIC_API_KEY!, model: 'claude-sonnet-4.5' });const smart = new AnthropicAdapter({ apiKey: process.env.ANTHROPIC_API_KEY!, model: 'claude-opus-4.5' }); const llm = new RouterLLM({ routes: [ { name: 'complex', condition: (prompt) => prompt.length > 500 || prompt.includes('analyse') || prompt.includes('compare'), adapter: smart, }, { name: 'code', condition: (prompt) => prompt.includes('code') || prompt.includes('function') || prompt.includes('```'), adapter: smart, }, ], defaultAdapter: fast, // Fallback for simple queries}); const orka = createOrka({ llm, vectorDB: new MemoryVectorAdapter() }); // Simple query → claude-sonnet-4.5const simple = await orka.generate('Say hello in one sentence.');console.log('Simple: ' + simple); // Complex query → claude-opus-4.5const complex = await orka.generate('Analyze the advantages and disadvantages of TypeScript vs JavaScript.');console.log('Complex: ' + complex);}2. ConsensusLLM - Meilleur de N
Interrogez plusieurs modèles et utilisez un juge pour sélectionner la meilleure réponse :
import { ConsensusLLM, AnthropicAdapter, MemoryVectorAdapter, createOrka } from 'orkajs'; const llm = new ConsensusLLM({ adapters: [ new AnthropicAdapter({ apiKey: process.env.ANTHROPIC_API_KEY!, model: 'claude-sonnet-4.5' }), new AnthropicAdapter({ apiKey: process.env.ANTHROPIC_API_KEY!, model: 'claude-opus-4.5' }), ], strategy: 'best_score', // 'best_score' | 'majority_vote' | 'first_valid' judge: new AnthropicAdapter({ apiKey: process.env.ANTHROPIC_API_KEY!, model: 'claude-opus-4.5' }),}); const orka = createOrka({ llm, vectorDB: new MemoryVectorAdapter() }); // Both models generate results, and the judge chooses the best answer.const result = await orka.generate('Explique la récursion en programmation.');console.log('Consensus: ' + result.slice(0, 200) + '...');3. RaceLLM - Le Plus Rapide Gagne
Faites concourir plusieurs modèles et retournez la première réponse :
import { RaceLLM, AnthropicAdapter, MemoryVectorAdapter, createOrka } from 'orkajs'; const llm = new RaceLLM({ adapters: [ new AnthropicAdapter({ apiKey: process.env.ANTHROPIC_API_KEY!, model: 'claude-sonnet-4.5' }), new AnthropicAdapter({ apiKey: process.env.ANTHROPIC_API_KEY!, model: 'claude-opus-4.5' }), ], timeout: 10000, // Timeout global en ms}); const orka = createOrka({ llm, vectorDB: new MemoryVectorAdapter() }); // The first model to respond wins, the others are cancelledconst result = await orka.generate('What is TypeScript?');console.log('Race winner: ' + result.slice(0, 200) + '...');4. LoadBalancerLLM - Distribuer la Charge
Distribuez les requêtes sur plusieurs instances de modèles :
import { LoadBalancerLLM, AnthropicAdapter, MemoryVectorAdapter, createOrka } from 'orkajs'; const lb = new LoadBalancerLLM({ adapters: [ new AnthropicAdapter({ apiKey: process.env.ANTHROPIC_API_KEY!, model: 'claude-sonnet-4.5' }), new AnthropicAdapter({ apiKey: process.env.ANTHROPIC_API_KEY!, model: 'claude-sonnet-4.5' }), ], strategy: 'round_robin', // 'round_robin' | 'weighted' | 'random'}); const orka = createOrka({ llm: lb, vectorDB: new MemoryVectorAdapter() }); // Requests are distributed in round-robinfor (let i = 0; i < 4; i++) { await orka.generate('Question ' + (i + 1));} // Usage statisticsconsole.log('Stats:', lb.getStats());// { adapter_0: { requests: 2, avgLatency: 150 }, adapter_1: { requests: 2, avgLatency: 145 } }5. Exemple Complet
import { createOrka, AnthropicAdapter, MemoryVectorAdapter, RouterLLM, ConsensusLLM, RaceLLM, LoadBalancerLLM,} from 'orkajs'; async function routerExample() { console.log('🔀 Router: Route by complexity\n'); const fast = new AnthropicAdapter({ apiKey: process.env.ANTHROPIC_API_KEY!, model: 'claude-sonnet-4.5' }); const smart = new AnthropicAdapter({ apiKey: process.env.ANTHROPIC_API_KEY!, model: 'claude-opus-4.5' }); const llm = new RouterLLM({ routes: [ { name: 'complex', condition: (p) => p.length > 500 || p.includes('analyse'), adapter: smart }, { name: 'code', condition: (p) => p.includes('code') || p.includes('```'), adapter: smart }, ], defaultAdapter: fast, }); const orka = createOrka({ llm, vectorDB: new MemoryVectorAdapter() }); const simple = await orka.generate('Say hello.'); console.log('Simple (gpt-4o-mini): ' + simple + '');} async function consensusExample() { console.log('🤝 Consensus: Best answer of 2 models\n'); const llm = new ConsensusLLM({ adapters: [ new AnthropicAdapter({ apiKey: process.env.ANTHROPIC_API_KEY!, model: 'claude-sonnet-4.5' }), new AnthropicAdapter({ apiKey: process.env.ANTHROPIC_API_KEY!, model: 'claude-opus-4.5' }), ], strategy: 'best_score', judge: new AnthropicAdapter({ apiKey: process.env.ANTHROPIC_API_KEY!, model: 'claude-opus-4.5' }), }); const orka = createOrka({ llm, vectorDB: new MemoryVectorAdapter() }); const result = await orka.generate('Explain recursion.'); console.log('Consensus: ' + result.slice(0, 200) + '...');} async function raceExample() { console.log('🏎️ Race: The fastest wins\n'); const llm = new RaceLLM({ adapters: [ new AnthropicAdapter({ apiKey: process.env.ANTHROPIC_API_KEY!, model: 'claude-sonnet-4.5' }), new AnthropicAdapter({ apiKey: process.env.ANTHROPIC_API_KEY!, model: 'claude-opus-4.5' }), ], timeout: 10000, }); const orka = createOrka({ llm, vectorDB: new MemoryVectorAdapter() }); const result = await orka.generate('What is TypeScript?'); console.log('Race winner: ' + result.slice(0, 200) + '...');} async function loadBalancerExample() { console.log('⚖️ Load Balancer: Round-robin\n'); const lb = new LoadBalancerLLM({ adapters: [ new AnthropicAdapter({ apiKey: process.env.ANTHROPIC_API_KEY!, model: 'claude-sonnet-4.5' }), new AnthropicAdapter({ apiKey: process.env.ANTHROPIC_API_KEY!, model: 'claude-sonnet-4.5' }), ], strategy: 'round_robin', }); const orka = createOrka({ llm: lb, vectorDB: new MemoryVectorAdapter() }); for (let i = 0; i < 4; i++) { await orka.generate('Question ' + (i + 1)); } console.log('Stats:', lb.getStats());} async function main() { await routerExample(); await consensusExample(); await raceExample(); await loadBalancerExample();} main().catch(console.error);Quand Utiliser Chaque Stratégie
Optimisation des coûts : modèles bon marché pour requêtes simples, coûteux pour complexes.
Qualité critique : obtenir la meilleure réponse en comparant plusieurs modèles.
Latence critique : obtenir la réponse la plus rapide peu importe le modèle.
Haut débit : distribuer la charge sur plusieurs clés API ou instances.