Durable Agents

Persistent, resumable agent execution

Build fault-tolerant agents that survive crashes, support pause/resume, and persist state across restarts. Perfect for long-running workflows, scheduled tasks, and mission-critical operations.

Installation

npm install @orka-js/durable
# or
pnpm add @orka-js/durable
 
# Optional: for Redis storage
npm install redis
 
# Optional: for cron scheduling
npm install node-cron

Key Features

Crash Recovery

Agents resume from last checkpoint after crashes

Pause/Resume

Pause execution and resume later with new input

Job Persistence

Store job state in memory or Redis

Retry Logic

Automatic retries with exponential backoff

Scheduled Jobs

Cron-based scheduling for recurring tasks

Streaming Support

Stream events while persisting progress

Basic Usage

Wrap any agent with DurableAgent for persistence.

durable-agent.ts

import { DurableAgent, MemoryStore } from '@orka-js/durable';
import { StreamingToolAgent } from '@orka-js/agent';
import { OpenAIAdapter } from '@orka-js/openai';
 
const llm = new OpenAIAdapter({ apiKey: process.env.OPENAI_API_KEY! });
 
const agent = new StreamingToolAgent({
  goal: 'Process customer support tickets',
  tools: [ticketTool, emailTool],
}, llm);
 
// Wrap with DurableAgent
const store = new MemoryStore();
const durableAgent = new DurableAgent(agent, store, {
  maxRetries: 3,
  retryDelayMs: 2000,
  metadata: { team: 'support' },
});
 
// Run a job
const job = await durableAgent.run('job_123', 'Handle ticket #456');
console.log(job.status); // "completed"
console.log(job.result); // Agent output
 
// Check job status later
const status = await durableAgent.status('job_123');
console.log(status?.completedAt);

# Storage Backends

Choose between in-memory or Redis storage.

stores.ts

// ─── In-Memory Store (default) ───────────────────────────────────────────────
import { MemoryStore } from '@orka-js/durable';
 
const memoryStore = new MemoryStore();
// Jobs are lost on restart — good for development
 
// ─── Redis Store (production) ─────────────────────────────────────────────────
import { RedisStore } from '@orka-js/durable';
import { createClient } from 'redis';
 
const redis = createClient({ url: 'redis://localhost:6379' });
await redis.connect();
 
const redisStore = new RedisStore(redis, {
  keyPrefix: 'orka:jobs:',
  ttlSeconds: 86400, // Jobs expire after 24h
});
 
const durableAgent = new DurableAgent(agent, redisStore);
 
// Jobs persist across restarts
const job = await durableAgent.run('job_456', 'Long-running task');
 
// Later, even after server restart:
const recovered = await durableAgent.status('job_456');
console.log(recovered?.status); // "completed"

# Pause & Resume

Pause long-running jobs and resume them later.

pause-resume.ts

import { DurableAgent, MemoryStore } from '@orka-js/durable';
 
const durableAgent = new DurableAgent(agent, new MemoryStore());
 
// Start a job
const job = await durableAgent.run('job_789', 'Analyze large dataset');
 
// User requests to pause
await durableAgent.pause('job_789');
console.log('Job paused — state saved');
 
// Later, resume with optional new input
const resumed = await durableAgent.resume('job_789', 'Continue with updated params');
console.log(resumed.status); // "completed"
 
// Cancel a job
await durableAgent.cancel('job_789');
 
// List all jobs
const allJobs = await durableAgent.list();
const pending = await durableAgent.list({ status: 'pending' });
const failed = await durableAgent.list({ status: 'failed' });

# Scheduled Jobs

Run agents on a cron schedule.

scheduling.ts

import { DurableAgent, MemoryStore } from '@orka-js/durable';
import cron from 'node-cron';
 
const durableAgent = new DurableAgent(agent, new MemoryStore(), {
  schedule: '0 9 * * MON', // Every Monday at 9 AM
  onSchedule: async () => {
    // Generate input for the scheduled run
    const tickets = await db.getUnresolvedTickets();
    return `Process ${tickets.length} pending tickets`;
  },
});
 
// Start the scheduler
durableAgent.startScheduler();
console.log('Scheduler started — agent runs every Monday at 9 AM');
 
// Stop the scheduler
durableAgent.stopScheduler();

# Streaming with Persistence

Stream events while saving job state.

streaming.ts

import { DurableAgent, RedisStore } from '@orka-js/durable';
 
const durableAgent = new DurableAgent(agent, redisStore);
 
// Stream events while persisting job state
for await (const event of durableAgent.runStream('job_999', 'Process invoice')) {
  if (event.type === 'job_status') {
    console.log('Job status:', event.job.status);
    // job_status events are emitted at start, end, and on errors
  } else if (event.type === 'token') {
    process.stdout.write(event.token); // LLM streaming
  } else if (event.type === 'tool_call') {
    console.log(`Tool called: ${event.name}`);
  } else if (event.type === 'done') {
    console.log('\nAgent finished:', event.content);
  }
}
 
// Job state is saved even if the process crashes mid-stream
// Resume from last checkpoint:
const recovered = await durableAgent.status('job_999');
if (recovered?.status === 'failed') {
  await durableAgent.run('job_999', recovered.input); // Retry
}

# Retry Logic

retry.ts

import { DurableAgent, MemoryStore } from '@orka-js/durable';
 
const durableAgent = new DurableAgent(agent, new MemoryStore(), {
  maxRetries: 5,           // Retry up to 5 times
  retryDelayMs: 1000,      // Wait 1s between retries
});
 
// If the agent throws an error, it will retry automatically
const job = await durableAgent.run('job_retry', 'Flaky API call');
 
console.log(job.attempts); // Number of attempts made
console.log(job.status);   // "completed" or "failed"
console.log(job.error);    // Error message if failed

# Architecture

architecture.ts

// ─── Job Lifecycle ────────────────────────────────────────────────────────────
 
// 1. pending   → Job created, not started yet
// 2. running   → Agent is executing
// 3. paused    → User paused execution
// 4. completed → Agent finished successfully
// 5. failed    → Agent failed after max retries
// 6. cancelled → User cancelled the job
 
// ─── Job Schema ───────────────────────────────────────────────────────────────
 
interface DurableJob {
  id: string;
  input: string;
  status: 'pending' | 'running' | 'paused' | 'completed' | 'failed' | 'cancelled';
  result?: string;
  error?: string;
  attempts: number;
  metadata?: Record<string, unknown>;
  createdAt: Date;
  updatedAt: Date;
  completedAt?: Date;
}

Production Best Practices

Use RedisStore for production — MemoryStore is for dev only
Set TTL on Redis jobs to avoid infinite storage growth
Monitor failed jobs and set up alerts
Use metadata to tag jobs by team, priority, or category
Implement idempotent agents — retries should be safe