Stall Detection — Recover Unresponsive Jobs Automatically in Bun

Stall detection automatically identifies and recovers jobs that become unresponsive during processing.

How It Works

Workers send periodic heartbeats while processing jobs
The queue manager checks for jobs without recent heartbeats
Stalled jobs are either retried or moved to the DLQ

Configuration

import { Queue } from 'bunqueue/client';

const queue = new Queue('my-queue', { embedded: true });

queue.setStallConfig({
  enabled: true,         // Enable stall detection (default: true)
  stallInterval: 30000,  // Job is stalled after 30s without heartbeat
  maxStalls: 3,          // Move to DLQ after 3 stalls
  gracePeriod: 5000,     // Grace period after job starts
});

Options

Option	Default	Description
`enabled`	`true`	Enable/disable stall detection
`stallInterval`	`30000`	Time (ms) without heartbeat before job is stalled
`maxStalls`	`3`	Max stalls before moving to DLQ
`gracePeriod`	`5000`	Initial grace period after job starts

Worker Heartbeats

Workers automatically send heartbeats:

const worker = new Worker('queue', processor, {
  embedded: true,
  heartbeatInterval: 10000, // Heartbeat every 10 seconds
});

The heartbeatInterval should be less than stallInterval to avoid false positives.

Stall Actions

When a job stalls, one of these actions is taken:

Retry - Job is re-queued with incremented stall count
Move to DLQ - Job exceeds maxStalls and is moved to Dead Letter Queue

When a job is retried after a stall or lock expiry, its internal counters (queued count, shard stats) are updated correctly and waiting workers are notified immediately. This means requeued jobs are picked up without delay.

Events

import { QueueEvents } from 'bunqueue/client';

const events = new QueueEvents('my-queue');

events.on('stalled', ({ jobId }) => {
  console.log(`Job ${jobId} stalled`);
});

Example: Long-Running Jobs

For jobs that take a long time, increase the stall interval:

// Queue for video processing (may take hours)
const videoQueue = new Queue('video-processing', { embedded: true });

videoQueue.setStallConfig({
  stallInterval: 300000,  // 5 minutes
  maxStalls: 2,
  gracePeriod: 60000,     // 1 minute grace
});

// Worker with frequent heartbeats
const worker = new Worker('video-processing', async (job) => {
  for (const chunk of video.chunks) {
    await processChunk(chunk);
    // updateProgress() also sends a heartbeat to reset the stall timer
    await job.updateProgress(chunk.progress);
  }
}, {
  embedded: true,
  heartbeatInterval: 30000, // Automatic heartbeat every 30 seconds
});

Monitoring

Check stall-related stats:

const stats = queue.getDlqStats();
console.log('Stalled jobs in DLQ:', stats.byReason.stalled);

Filter DLQ by stalled reason:

const stalledJobs = queue.getDlq({ reason: 'stalled' });

SandboxedWorker

SandboxedWorker automatically sends heartbeats in both embedded and TCP mode. In embedded mode, heartbeatInterval defaults to 5000ms, keeping lastHeartbeat fresh so long-running jobs are not falsely detected as stalled.

const worker = new SandboxedWorker('heavy-jobs', {
  processor: './processor.ts',
  timeout: 0,              // Disable worker-level timeout for long jobs
  heartbeatInterval: 5000, // Default in embedded mode (keeps stall detection happy)
});