Framework de Tests

Tests déterministes pour agents LLM

Écrivez des tests fiables, rapides et déterministes pour vos agents OrkaJS. Moquez les réponses LLM, assertez le comportement des agents et intégrez avec Vitest/Jest.

Installation

npm install -D @orka-js/test
# or
pnpm add -D @orka-js/test

Fonctionnalités Clés

MockLLM

Réponses LLM déterministes pour tests prévisibles

AgentTestBed

Orchestrez les tests avec assertions fluides

Pattern Matching

Matchez les prompts avec strings, regex ou fonctions

Mock Tool Calls

Simulez les appels d'outils et vérifiez l'exécution

Simulation Latence

Testez la gestion des timeouts et retry

Assertions d'Appels

Vérifiez que le LLM a été appelé avec les bons prompts

# MockLLM — Réponses Déterministes

Remplacez les vrais adaptateurs LLM par des réponses mock prévisibles.

mock-llm.test.ts

import { mockLLM } from '@orka-js/test';
import { StreamingToolAgent } from '@orka-js/agent';
 
// Create a mock LLM with predefined responses
const llm = mockLLM([
  { when: /weather/, output: 'It is sunny in Paris, 22°C' },
  { when: /capital of France/, output: 'The capital of France is Paris' },
  { when: /book/, toolCall: { name: 'bookDemo', args: { slot: 'tomorrow' } } },
]);
 
const agent = new StreamingToolAgent({
  goal: 'Answer questions',
  tools: [],
}, llm);
 
const result = await agent.run('What is the weather in Paris?');
console.log(result.output); // "It is sunny in Paris, 22°C"
 
// Verify LLM was called
console.log(llm.getCallCount()); // 1
console.log(llm.wasCalledWith(/weather/)); // true

# AgentTestBed — Assertions Fluides

Testez les agents avec une API fluide pour assertions et snapshots.

agent.test.ts

import { AgentTestBed, mockLLM } from '@orka-js/test';
import { StreamingToolAgent } from '@orka-js/agent';
import { describe, it, expect } from 'vitest';
 
describe('Weather Agent', () => {
  it('should answer weather questions', async () => {
    const llm = mockLLM([
      { when: /weather/, output: 'Sunny, 22°C' },
    ]);
 
    const agent = new StreamingToolAgent({
      goal: 'Answer weather questions',
      tools: [],
    }, llm);
 
    const bed = new AgentTestBed({ agent, llm });
    const result = await bed.run('What is the weather?');
 
    // Fluent assertions
    result.toHaveOutput(/Sunny/);
    result.toHaveUsedLLM();
    result.toHaveTokenCount({ min: 10 });
 
    expect(llm.getCallCount()).toBe(1);
  });
 
  it('should call tools when needed', async () => {
    const llm = mockLLM([
      { when: /book/, toolCall: { name: 'bookDemo', args: { slot: 'tomorrow' } } },
    ]);
 
    const bookTool = {
      name: 'bookDemo',
      description: 'Book a demo',
      parameters: [{ name: 'slot', type: 'string', required: true }],
      execute: async ({ slot }) => ({ output: `Demo booked for ${slot}` }),
    };
 
    const agent = new StreamingToolAgent({
      goal: 'Help users book demos',
      tools: [bookTool],
    }, llm);
 
    const bed = new AgentTestBed({ agent, llm });
    const result = await bed.run('I want to book a demo');
 
    result.toHaveCalledTool('bookDemo');
    result.toHaveToolArgs('bookDemo', { slot: 'tomorrow' });
  });
});

Pattern Matching

Configurez les réponses mock basées sur des patterns de prompts.

patterns.test.ts

import { mockLLM } from '@orka-js/test';
 
const llm = mockLLM([
  // String matching (case-insensitive substring)
  { when: 'weather', output: 'Sunny' },
 
  // Regex matching
  { when: /capital of (\w+)/, output: 'Paris' },
 
  // Function matching
  { when: (prompt) => prompt.includes('urgent'), output: 'High priority response' },
 
  // Default fallback (no 'when' condition)
  { output: 'I don't understand' },
]);
 
await llm.generate('What is the weather?'); // "Sunny"
await llm.generate('What is the capital of France?'); // "Paris"
await llm.generate('This is urgent!'); // "High priority response"
await llm.generate('Random question'); // "I don't understand"

Mock des Appels d'Outils

Simulez les appels d'outils et vérifiez le comportement de l'agent.

tool-calls.test.ts

import { mockLLM } from '@orka-js/test';
 
const llm = mockLLM([
  {
    when: /search for (.+)/,
    toolCall: {
      name: 'search_products',
      args: { query: 'headphones', maxPrice: 200 },
    },
  },
  {
    when: /multiple tools/,
    toolCall: [
      { name: 'tool1', args: { param: 'value1' } },
      { name: 'tool2', args: { param: 'value2' } },
    ],
  },
]);
 
// The agent will receive a tool_call event instead of text
for await (const event of llm.stream('search for headphones')) {
  if (event.type === 'tool_call') {
    console.log(event.name); // "search_products"
    console.log(event.arguments); // '{"query":"headphones","maxPrice":200}'
  }
}

Latency & Error Simulation

latency.test.ts

import { mockLLM } from '@orka-js/test';
 
const llm = mockLLM([
  { when: /slow/, output: 'Delayed response', latencyMs: 2000 },
  { when: /error/, error: new Error('Simulated API error') },
]);
 
// Simulate slow API
const start = Date.now();
await llm.generate('This is slow');
console.log(`Took ${Date.now() - start}ms`); // ~2000ms
 
// Simulate errors
try {
  await llm.generate('Trigger error');
} catch (err) {
  console.log(err.message); // "Simulated API error"
}

Intégration CI/CD

Exécutez les tests dans les pipelines CI avec Vitest ou Jest.

ci-setup

// vitest.config.ts
import { defineConfig } from 'vitest/config';
 
export default defineConfig({
  test: {
    globals: true,
    environment: 'node',
    setupFiles: ['./test/setup.ts'],
  },
});
 
// test/setup.ts
import { extendExpect } from '@orka-js/test';
extendExpect();
 
// package.json
{
  "scripts": {
    "test": "vitest run",
    "test:watch": "vitest",
    "test:coverage": "vitest run --coverage"
  }
}
 
// GitHub Actions (.github/workflows/test.yml)
name: Test
on: [push, pull_request]
jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - uses: actions/setup-node@v3
        with:
          node-version: 18
      - run: npm ci
      - run: npm test

Bonnes Pratiques

Utilisez MockLLM pour les tests unitaires, vrai LLM pour les tests d'intégration
Testez les cas limites : erreurs, timeouts, échecs d'outils
Snapshot les sorties d'agents pour détecter les régressions
Exécutez les tests en CI à chaque commit