OrkaJS
Orka.JS

Durable Agents

Persistent, resumable agent execution

Build fault-tolerant agents that survive crashes, support pause/resume, and persist state across restarts. Perfect for long-running workflows, scheduled tasks, and mission-critical operations.

Installation

npm install @orka-js/durable
# or
pnpm add @orka-js/durable
 
# Optional: for Redis storage
npm install redis
 
# Optional: for cron scheduling
npm install node-cron

Key Features

Crash Recovery

Agents resume from last checkpoint after crashes

Pause/Resume

Pause execution and resume later with new input

Job Persistence

Store job state in memory or Redis

Retry Logic

Automatic retries with exponential backoff

Scheduled Jobs

Cron-based scheduling for recurring tasks

Streaming Support

Stream events while persisting progress

Basic Usage

Wrap any agent with DurableAgent for persistence.

durable-agent.ts
import { DurableAgent, MemoryStore } from '@orka-js/durable';
import { StreamingToolAgent } from '@orka-js/agent';
import { OpenAIAdapter } from '@orka-js/openai';
 
const llm = new OpenAIAdapter({ apiKey: process.env.OPENAI_API_KEY! });
 
const agent = new StreamingToolAgent({
goal: 'Process customer support tickets',
tools: [ticketTool, emailTool],
}, llm);
 
// Wrap with DurableAgent
const store = new MemoryStore();
const durableAgent = new DurableAgent(agent, store, {
maxRetries: 3,
retryDelayMs: 2000,
metadata: { team: 'support' },
});
 
// Run a job
const job = await durableAgent.run('job_123', 'Handle ticket #456');
console.log(job.status); // "completed"
console.log(job.result); // Agent output
 
// Check job status later
const status = await durableAgent.status('job_123');
console.log(status?.completedAt);

# Storage Backends

Choose between in-memory or Redis storage.

stores.ts
// ─── In-Memory Store (default) ───────────────────────────────────────────────
import { MemoryStore } from '@orka-js/durable';
 
const memoryStore = new MemoryStore();
// Jobs are lost on restart — good for development
 
// ─── Redis Store (production) ─────────────────────────────────────────────────
import { RedisStore } from '@orka-js/durable';
import { createClient } from 'redis';
 
const redis = createClient({ url: 'redis://localhost:6379' });
await redis.connect();
 
const redisStore = new RedisStore(redis, {
keyPrefix: 'orka:jobs:',
ttlSeconds: 86400, // Jobs expire after 24h
});
 
const durableAgent = new DurableAgent(agent, redisStore);
 
// Jobs persist across restarts
const job = await durableAgent.run('job_456', 'Long-running task');
 
// Later, even after server restart:
const recovered = await durableAgent.status('job_456');
console.log(recovered?.status); // "completed"

# Pause & Resume

Pause long-running jobs and resume them later.

pause-resume.ts
import { DurableAgent, MemoryStore } from '@orka-js/durable';
 
const durableAgent = new DurableAgent(agent, new MemoryStore());
 
// Start a job
const job = await durableAgent.run('job_789', 'Analyze large dataset');
 
// User requests to pause
await durableAgent.pause('job_789');
console.log('Job paused — state saved');
 
// Later, resume with optional new input
const resumed = await durableAgent.resume('job_789', 'Continue with updated params');
console.log(resumed.status); // "completed"
 
// Cancel a job
await durableAgent.cancel('job_789');
 
// List all jobs
const allJobs = await durableAgent.list();
const pending = await durableAgent.list({ status: 'pending' });
const failed = await durableAgent.list({ status: 'failed' });

# Scheduled Jobs

Run agents on a cron schedule.

scheduling.ts
import { DurableAgent, MemoryStore } from '@orka-js/durable';
import cron from 'node-cron';
 
const durableAgent = new DurableAgent(agent, new MemoryStore(), {
schedule: '0 9 * * MON', // Every Monday at 9 AM
onSchedule: async () => {
// Generate input for the scheduled run
const tickets = await db.getUnresolvedTickets();
return `Process ${tickets.length} pending tickets`;
},
});
 
// Start the scheduler
durableAgent.startScheduler();
console.log('Scheduler started — agent runs every Monday at 9 AM');
 
// Stop the scheduler
durableAgent.stopScheduler();

# Streaming with Persistence

Stream events while saving job state.

streaming.ts
import { DurableAgent, RedisStore } from '@orka-js/durable';
 
const durableAgent = new DurableAgent(agent, redisStore);
 
// Stream events while persisting job state
for await (const event of durableAgent.runStream('job_999', 'Process invoice')) {
if (event.type === 'job_status') {
console.log('Job status:', event.job.status);
// job_status events are emitted at start, end, and on errors
} else if (event.type === 'token') {
process.stdout.write(event.token); // LLM streaming
} else if (event.type === 'tool_call') {
console.log(`Tool called: ${event.name}`);
} else if (event.type === 'done') {
console.log('\nAgent finished:', event.content);
}
}
 
// Job state is saved even if the process crashes mid-stream
// Resume from last checkpoint:
const recovered = await durableAgent.status('job_999');
if (recovered?.status === 'failed') {
await durableAgent.run('job_999', recovered.input); // Retry
}

# Retry Logic

retry.ts
import { DurableAgent, MemoryStore } from '@orka-js/durable';
 
const durableAgent = new DurableAgent(agent, new MemoryStore(), {
maxRetries: 5, // Retry up to 5 times
retryDelayMs: 1000, // Wait 1s between retries
});
 
// If the agent throws an error, it will retry automatically
const job = await durableAgent.run('job_retry', 'Flaky API call');
 
console.log(job.attempts); // Number of attempts made
console.log(job.status); // "completed" or "failed"
console.log(job.error); // Error message if failed

# Architecture

architecture.ts
// ─── Job Lifecycle ────────────────────────────────────────────────────────────
 
// 1. pending → Job created, not started yet
// 2. running → Agent is executing
// 3. paused → User paused execution
// 4. completed → Agent finished successfully
// 5. failed → Agent failed after max retries
// 6. cancelled → User cancelled the job
 
// ─── Job Schema ───────────────────────────────────────────────────────────────
 
interface DurableJob {
id: string;
input: string;
status: 'pending' | 'running' | 'paused' | 'completed' | 'failed' | 'cancelled';
result?: string;
error?: string;
attempts: number;
metadata?: Record<string, unknown>;
createdAt: Date;
updatedAt: Date;
completedAt?: Date;
}

Production Best Practices

  • Use RedisStore for production — MemoryStore is for dev only
  • Set TTL on Redis jobs to avoid infinite storage growth
  • Monitor failed jobs and set up alerts
  • Use metadata to tag jobs by team, priority, or category
  • Implement idempotent agents — retries should be safe