Durable Agents
Persistent, resumable agent execution
Build fault-tolerant agents that survive crashes, support pause/resume, and persist state across restarts. Perfect for long-running workflows, scheduled tasks, and mission-critical operations.
Installation
npm install @orka-js/durable# orpnpm add @orka-js/durable # Optional: for Redis storagenpm install redis # Optional: for cron schedulingnpm install node-cronKey Features
Crash Recovery
Agents resume from last checkpoint after crashes
Pause/Resume
Pause execution and resume later with new input
Job Persistence
Store job state in memory or Redis
Retry Logic
Automatic retries with exponential backoff
Scheduled Jobs
Cron-based scheduling for recurring tasks
Streaming Support
Stream events while persisting progress
Basic Usage
Wrap any agent with DurableAgent for persistence.
durable-agent.ts
import { DurableAgent, MemoryStore } from '@orka-js/durable';import { StreamingToolAgent } from '@orka-js/agent';import { OpenAIAdapter } from '@orka-js/openai'; const llm = new OpenAIAdapter({ apiKey: process.env.OPENAI_API_KEY! }); const agent = new StreamingToolAgent({ goal: 'Process customer support tickets', tools: [ticketTool, emailTool],}, llm); // Wrap with DurableAgentconst store = new MemoryStore();const durableAgent = new DurableAgent(agent, store, { maxRetries: 3, retryDelayMs: 2000, metadata: { team: 'support' },}); // Run a jobconst job = await durableAgent.run('job_123', 'Handle ticket #456');console.log(job.status); // "completed"console.log(job.result); // Agent output // Check job status laterconst status = await durableAgent.status('job_123');console.log(status?.completedAt);# Storage Backends
Choose between in-memory or Redis storage.
stores.ts
// ─── In-Memory Store (default) ───────────────────────────────────────────────import { MemoryStore } from '@orka-js/durable'; const memoryStore = new MemoryStore();// Jobs are lost on restart — good for development // ─── Redis Store (production) ─────────────────────────────────────────────────import { RedisStore } from '@orka-js/durable';import { createClient } from 'redis'; const redis = createClient({ url: 'redis://localhost:6379' });await redis.connect(); const redisStore = new RedisStore(redis, { keyPrefix: 'orka:jobs:', ttlSeconds: 86400, // Jobs expire after 24h}); const durableAgent = new DurableAgent(agent, redisStore); // Jobs persist across restartsconst job = await durableAgent.run('job_456', 'Long-running task'); // Later, even after server restart:const recovered = await durableAgent.status('job_456');console.log(recovered?.status); // "completed"# Pause & Resume
Pause long-running jobs and resume them later.
pause-resume.ts
import { DurableAgent, MemoryStore } from '@orka-js/durable'; const durableAgent = new DurableAgent(agent, new MemoryStore()); // Start a jobconst job = await durableAgent.run('job_789', 'Analyze large dataset'); // User requests to pauseawait durableAgent.pause('job_789');console.log('Job paused — state saved'); // Later, resume with optional new inputconst resumed = await durableAgent.resume('job_789', 'Continue with updated params');console.log(resumed.status); // "completed" // Cancel a jobawait durableAgent.cancel('job_789'); // List all jobsconst allJobs = await durableAgent.list();const pending = await durableAgent.list({ status: 'pending' });const failed = await durableAgent.list({ status: 'failed' });# Scheduled Jobs
Run agents on a cron schedule.
scheduling.ts
import { DurableAgent, MemoryStore } from '@orka-js/durable';import cron from 'node-cron'; const durableAgent = new DurableAgent(agent, new MemoryStore(), { schedule: '0 9 * * MON', // Every Monday at 9 AM onSchedule: async () => { // Generate input for the scheduled run const tickets = await db.getUnresolvedTickets(); return `Process ${tickets.length} pending tickets`; },}); // Start the schedulerdurableAgent.startScheduler();console.log('Scheduler started — agent runs every Monday at 9 AM'); // Stop the schedulerdurableAgent.stopScheduler();# Streaming with Persistence
Stream events while saving job state.
streaming.ts
import { DurableAgent, RedisStore } from '@orka-js/durable'; const durableAgent = new DurableAgent(agent, redisStore); // Stream events while persisting job statefor await (const event of durableAgent.runStream('job_999', 'Process invoice')) { if (event.type === 'job_status') { console.log('Job status:', event.job.status); // job_status events are emitted at start, end, and on errors } else if (event.type === 'token') { process.stdout.write(event.token); // LLM streaming } else if (event.type === 'tool_call') { console.log(`Tool called: ${event.name}`); } else if (event.type === 'done') { console.log('\nAgent finished:', event.content); }} // Job state is saved even if the process crashes mid-stream// Resume from last checkpoint:const recovered = await durableAgent.status('job_999');if (recovered?.status === 'failed') { await durableAgent.run('job_999', recovered.input); // Retry}# Retry Logic
retry.ts
import { DurableAgent, MemoryStore } from '@orka-js/durable'; const durableAgent = new DurableAgent(agent, new MemoryStore(), { maxRetries: 5, // Retry up to 5 times retryDelayMs: 1000, // Wait 1s between retries}); // If the agent throws an error, it will retry automaticallyconst job = await durableAgent.run('job_retry', 'Flaky API call'); console.log(job.attempts); // Number of attempts madeconsole.log(job.status); // "completed" or "failed"console.log(job.error); // Error message if failed# Architecture
architecture.ts
// ─── Job Lifecycle ──────────────────────────────────────────────────────────── // 1. pending → Job created, not started yet// 2. running → Agent is executing// 3. paused → User paused execution// 4. completed → Agent finished successfully// 5. failed → Agent failed after max retries// 6. cancelled → User cancelled the job // ─── Job Schema ─────────────────────────────────────────────────────────────── interface DurableJob { id: string; input: string; status: 'pending' | 'running' | 'paused' | 'completed' | 'failed' | 'cancelled'; result?: string; error?: string; attempts: number; metadata?: Record<string, unknown>; createdAt: Date; updatedAt: Date; completedAt?: Date;}Production Best Practices
- Use RedisStore for production — MemoryStore is for dev only
- Set TTL on Redis jobs to avoid infinite storage growth
- Monitor failed jobs and set up alerts
- Use metadata to tag jobs by team, priority, or category
- Implement idempotent agents — retries should be safe