Testing Framework
Deterministic testing for LLM agents
Write reliable, fast, and deterministic tests for your OrkaJS agents. Mock LLM responses, assert on agent behavior, and integrate with Vitest/Jest.
Installation
npm install -D @orka-js/test# orpnpm add -D @orka-js/testKey Features
MockLLM
Deterministic LLM responses for predictable tests
AgentTestBed
Orchestrate agent tests with fluent assertions
Pattern Matching
Match prompts with strings, regex, or functions
Tool Call Mocking
Simulate tool calls and verify execution
Latency Simulation
Test timeout handling and retry logic
Call Assertions
Verify LLM was called with expected prompts
# MockLLM — Deterministic Responses
Replace real LLM adapters with predictable mock responses.
mock-llm.test.ts
import { mockLLM } from '@orka-js/test';import { StreamingToolAgent } from '@orka-js/agent'; // Create a mock LLM with predefined responsesconst llm = mockLLM([ { when: /weather/, output: 'It is sunny in Paris, 22°C' }, { when: /capital of France/, output: 'The capital of France is Paris' }, { when: /book/, toolCall: { name: 'bookDemo', args: { slot: 'tomorrow' } } },]); const agent = new StreamingToolAgent({ goal: 'Answer questions', tools: [],}, llm); const result = await agent.run('What is the weather in Paris?');console.log(result.output); // "It is sunny in Paris, 22°C" // Verify LLM was calledconsole.log(llm.getCallCount()); // 1console.log(llm.wasCalledWith(/weather/)); // true# AgentTestBed — Fluent Assertions
Test agents with a fluent API for assertions and snapshots.
agent.test.ts
import { AgentTestBed, mockLLM } from '@orka-js/test';import { StreamingToolAgent } from '@orka-js/agent';import { describe, it, expect } from 'vitest'; describe('Weather Agent', () => { it('should answer weather questions', async () => { const llm = mockLLM([ { when: /weather/, output: 'Sunny, 22°C' }, ]); const agent = new StreamingToolAgent({ goal: 'Answer weather questions', tools: [], }, llm); const bed = new AgentTestBed({ agent, llm }); const result = await bed.run('What is the weather?'); // Fluent assertions result.toHaveOutput(/Sunny/); result.toHaveUsedLLM(); result.toHaveTokenCount({ min: 10 }); expect(llm.getCallCount()).toBe(1); }); it('should call tools when needed', async () => { const llm = mockLLM([ { when: /book/, toolCall: { name: 'bookDemo', args: { slot: 'tomorrow' } } }, ]); const bookTool = { name: 'bookDemo', description: 'Book a demo', parameters: [{ name: 'slot', type: 'string', required: true }], execute: async ({ slot }) => ({ output: `Demo booked for ${slot}` }), }; const agent = new StreamingToolAgent({ goal: 'Help users book demos', tools: [bookTool], }, llm); const bed = new AgentTestBed({ agent, llm }); const result = await bed.run('I want to book a demo'); result.toHaveCalledTool('bookDemo'); result.toHaveToolArgs('bookDemo', { slot: 'tomorrow' }); });});Pattern Matching
Configure mock responses based on prompt patterns.
patterns.test.ts
import { mockLLM } from '@orka-js/test'; const llm = mockLLM([ // String matching (case-insensitive substring) { when: 'weather', output: 'Sunny' }, // Regex matching { when: /capital of (\w+)/, output: 'Paris' }, // Function matching { when: (prompt) => prompt.includes('urgent'), output: 'High priority response' }, // Default fallback (no 'when' condition) { output: 'I don't understand' },]); await llm.generate('What is the weather?'); // "Sunny"await llm.generate('What is the capital of France?'); // "Paris"await llm.generate('This is urgent!'); // "High priority response"await llm.generate('Random question'); // "I don't understand"Tool Call Mocking
Simulate tool calls and verify agent behavior.
tool-calls.test.ts
import { mockLLM } from '@orka-js/test'; const llm = mockLLM([ { when: /search for (.+)/, toolCall: { name: 'search_products', args: { query: 'headphones', maxPrice: 200 }, }, }, { when: /multiple tools/, toolCall: [ { name: 'tool1', args: { param: 'value1' } }, { name: 'tool2', args: { param: 'value2' } }, ], },]); // The agent will receive a tool_call event instead of textfor await (const event of llm.stream('search for headphones')) { if (event.type === 'tool_call') { console.log(event.name); // "search_products" console.log(event.arguments); // '{"query":"headphones","maxPrice":200}' }}Latency & Error Simulation
latency.test.ts
import { mockLLM } from '@orka-js/test'; const llm = mockLLM([ { when: /slow/, output: 'Delayed response', latencyMs: 2000 }, { when: /error/, error: new Error('Simulated API error') },]); // Simulate slow APIconst start = Date.now();await llm.generate('This is slow');console.log(`Took ${Date.now() - start}ms`); // ~2000ms // Simulate errorstry { await llm.generate('Trigger error');} catch (err) { console.log(err.message); // "Simulated API error"}CI/CD Integration
Run tests in CI pipelines with Vitest or Jest.
ci-setup
// vitest.config.tsimport { defineConfig } from 'vitest/config'; export default defineConfig({ test: { globals: true, environment: 'node', setupFiles: ['./test/setup.ts'], },}); // test/setup.tsimport { extendExpect } from '@orka-js/test';extendExpect(); // package.json{ "scripts": { "test": "vitest run", "test:watch": "vitest", "test:coverage": "vitest run --coverage" }} // GitHub Actions (.github/workflows/test.yml)name: Teston: [push, pull_request]jobs: test: runs-on: ubuntu-latest steps: - uses: actions/checkout@v3 - uses: actions/setup-node@v3 with: node-version: 18 - run: npm ci - run: npm testBest Practices
- Use MockLLM for unit tests, real LLM for integration tests
- Test edge cases: errors, timeouts, tool failures
- Snapshot agent outputs for regression detection
- Run tests in CI on every commit