Testing Framework

Deterministic testing for LLM agents

Write reliable, fast, and deterministic tests for your OrkaJS agents. Mock LLM responses, assert on agent behavior, and integrate with Vitest/Jest.

Installation

npm install -D @orka-js/test
# or
pnpm add -D @orka-js/test

Key Features

MockLLM

Deterministic LLM responses for predictable tests

AgentTestBed

Orchestrate agent tests with fluent assertions

Pattern Matching

Match prompts with strings, regex, or functions

Tool Call Mocking

Simulate tool calls and verify execution

Latency Simulation

Test timeout handling and retry logic

Call Assertions

Verify LLM was called with expected prompts

# MockLLM — Deterministic Responses

Replace real LLM adapters with predictable mock responses.

mock-llm.test.ts

import { mockLLM } from '@orka-js/test';
import { StreamingToolAgent } from '@orka-js/agent';
 
// Create a mock LLM with predefined responses
const llm = mockLLM([
  { when: /weather/, output: 'It is sunny in Paris, 22°C' },
  { when: /capital of France/, output: 'The capital of France is Paris' },
  { when: /book/, toolCall: { name: 'bookDemo', args: { slot: 'tomorrow' } } },
]);
 
const agent = new StreamingToolAgent({
  goal: 'Answer questions',
  tools: [],
}, llm);
 
const result = await agent.run('What is the weather in Paris?');
console.log(result.output); // "It is sunny in Paris, 22°C"
 
// Verify LLM was called
console.log(llm.getCallCount()); // 1
console.log(llm.wasCalledWith(/weather/)); // true

# AgentTestBed — Fluent Assertions

Test agents with a fluent API for assertions and snapshots.

agent.test.ts

import { AgentTestBed, mockLLM } from '@orka-js/test';
import { StreamingToolAgent } from '@orka-js/agent';
import { describe, it, expect } from 'vitest';
 
describe('Weather Agent', () => {
  it('should answer weather questions', async () => {
    const llm = mockLLM([
      { when: /weather/, output: 'Sunny, 22°C' },
    ]);
 
    const agent = new StreamingToolAgent({
      goal: 'Answer weather questions',
      tools: [],
    }, llm);
 
    const bed = new AgentTestBed({ agent, llm });
    const result = await bed.run('What is the weather?');
 
    // Fluent assertions
    result.toHaveOutput(/Sunny/);
    result.toHaveUsedLLM();
    result.toHaveTokenCount({ min: 10 });
 
    expect(llm.getCallCount()).toBe(1);
  });
 
  it('should call tools when needed', async () => {
    const llm = mockLLM([
      { when: /book/, toolCall: { name: 'bookDemo', args: { slot: 'tomorrow' } } },
    ]);
 
    const bookTool = {
      name: 'bookDemo',
      description: 'Book a demo',
      parameters: [{ name: 'slot', type: 'string', required: true }],
      execute: async ({ slot }) => ({ output: `Demo booked for ${slot}` }),
    };
 
    const agent = new StreamingToolAgent({
      goal: 'Help users book demos',
      tools: [bookTool],
    }, llm);
 
    const bed = new AgentTestBed({ agent, llm });
    const result = await bed.run('I want to book a demo');
 
    result.toHaveCalledTool('bookDemo');
    result.toHaveToolArgs('bookDemo', { slot: 'tomorrow' });
  });
});

Pattern Matching

Configure mock responses based on prompt patterns.

patterns.test.ts

import { mockLLM } from '@orka-js/test';
 
const llm = mockLLM([
  // String matching (case-insensitive substring)
  { when: 'weather', output: 'Sunny' },
 
  // Regex matching
  { when: /capital of (\w+)/, output: 'Paris' },
 
  // Function matching
  { when: (prompt) => prompt.includes('urgent'), output: 'High priority response' },
 
  // Default fallback (no 'when' condition)
  { output: 'I don't understand' },
]);
 
await llm.generate('What is the weather?'); // "Sunny"
await llm.generate('What is the capital of France?'); // "Paris"
await llm.generate('This is urgent!'); // "High priority response"
await llm.generate('Random question'); // "I don't understand"

Tool Call Mocking

Simulate tool calls and verify agent behavior.

tool-calls.test.ts

import { mockLLM } from '@orka-js/test';
 
const llm = mockLLM([
  {
    when: /search for (.+)/,
    toolCall: {
      name: 'search_products',
      args: { query: 'headphones', maxPrice: 200 },
    },
  },
  {
    when: /multiple tools/,
    toolCall: [
      { name: 'tool1', args: { param: 'value1' } },
      { name: 'tool2', args: { param: 'value2' } },
    ],
  },
]);
 
// The agent will receive a tool_call event instead of text
for await (const event of llm.stream('search for headphones')) {
  if (event.type === 'tool_call') {
    console.log(event.name); // "search_products"
    console.log(event.arguments); // '{"query":"headphones","maxPrice":200}'
  }
}

Latency & Error Simulation

latency.test.ts

import { mockLLM } from '@orka-js/test';
 
const llm = mockLLM([
  { when: /slow/, output: 'Delayed response', latencyMs: 2000 },
  { when: /error/, error: new Error('Simulated API error') },
]);
 
// Simulate slow API
const start = Date.now();
await llm.generate('This is slow');
console.log(`Took ${Date.now() - start}ms`); // ~2000ms
 
// Simulate errors
try {
  await llm.generate('Trigger error');
} catch (err) {
  console.log(err.message); // "Simulated API error"
}

CI/CD Integration

Run tests in CI pipelines with Vitest or Jest.

ci-setup

// vitest.config.ts
import { defineConfig } from 'vitest/config';
 
export default defineConfig({
  test: {
    globals: true,
    environment: 'node',
    setupFiles: ['./test/setup.ts'],
  },
});
 
// test/setup.ts
import { extendExpect } from '@orka-js/test';
extendExpect();
 
// package.json
{
  "scripts": {
    "test": "vitest run",
    "test:watch": "vitest",
    "test:coverage": "vitest run --coverage"
  }
}
 
// GitHub Actions (.github/workflows/test.yml)
name: Test
on: [push, pull_request]
jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - uses: actions/setup-node@v3
        with:
          node-version: 18
      - run: npm ci
      - run: npm test

Best Practices

Use MockLLM for unit tests, real LLM for integration tests
Test edge cases: errors, timeouts, tool failures
Snapshot agent outputs for regression detection
Run tests in CI on every commit