Stall Detection — Recover Unresponsive Jobs Automatically in Bun
Stall detection automatically identifies and recovers jobs that become unresponsive during processing.
How It Works
Section titled “How It Works”- Workers send periodic heartbeats while processing jobs
- The queue manager checks for jobs without recent heartbeats
- Stalled jobs are either retried or moved to the DLQ
Configuration
Section titled “Configuration”import { Queue } from 'bunqueue/client';
const queue = new Queue('my-queue', { embedded: true });
queue.setStallConfig({ enabled: true, // Enable stall detection (default: true) stallInterval: 30000, // Job is stalled after 30s without heartbeat maxStalls: 3, // Move to DLQ after 3 stalls gracePeriod: 5000, // Grace period after job starts});Options
Section titled “Options”| Option | Default | Description |
|---|---|---|
enabled | true | Enable/disable stall detection |
stallInterval | 30000 | Time (ms) without heartbeat before job is stalled |
maxStalls | 3 | Max stalls before moving to DLQ |
gracePeriod | 5000 | Initial grace period after job starts |
Worker Heartbeats
Section titled “Worker Heartbeats”Workers automatically send heartbeats:
const worker = new Worker('queue', processor, { embedded: true, heartbeatInterval: 10000, // Heartbeat every 10 seconds});The heartbeatInterval should be less than stallInterval to avoid false positives.
Stall Actions
Section titled “Stall Actions”When a job stalls, one of these actions is taken:
- Retry - Job is re-queued with incremented stall count
- Move to DLQ - Job exceeds
maxStallsand is moved to Dead Letter Queue
When a job is retried after a stall or lock expiry, its internal counters (queued count, shard stats) are updated correctly and waiting workers are notified immediately. This means requeued jobs are picked up without delay.
Events
Section titled “Events”import { QueueEvents } from 'bunqueue/client';
const events = new QueueEvents('my-queue');
events.on('stalled', ({ jobId }) => { console.log(`Job ${jobId} stalled`);});Example: Long-Running Jobs
Section titled “Example: Long-Running Jobs”For jobs that take a long time, increase the stall interval:
// Queue for video processing (may take hours)const videoQueue = new Queue('video-processing', { embedded: true });
videoQueue.setStallConfig({ stallInterval: 300000, // 5 minutes maxStalls: 2, gracePeriod: 60000, // 1 minute grace});
// Worker with frequent heartbeatsconst worker = new Worker('video-processing', async (job) => { for (const chunk of video.chunks) { await processChunk(chunk); // updateProgress() also sends a heartbeat to reset the stall timer await job.updateProgress(chunk.progress); }}, { embedded: true, heartbeatInterval: 30000, // Automatic heartbeat every 30 seconds});Monitoring
Section titled “Monitoring”Check stall-related stats:
const stats = queue.getDlqStats();console.log('Stalled jobs in DLQ:', stats.byReason.stalled);Filter DLQ by stalled reason:
const stalledJobs = queue.getDlq({ reason: 'stalled' });SandboxedWorker
Section titled “SandboxedWorker”SandboxedWorker automatically sends heartbeats in both embedded and TCP mode. In embedded mode, heartbeatInterval defaults to 5000ms, keeping lastHeartbeat fresh so long-running jobs are not falsely detected as stalled.
const worker = new SandboxedWorker('heavy-jobs', { processor: './processor.ts', timeout: 0, // Disable worker-level timeout for long jobs heartbeatInterval: 5000, // Default in embedded mode (keeps stall detection happy)});