bunqueue Changelog: Version History & Release Notes
All notable changes to bunqueue are documented here.
[2.6.67] - 2026-03-22
Section titled “[2.6.67] - 2026-03-22”Changed
Section titled “Changed”- Disabled flaky SandboxedWorker tests — Commented out all 35 SandboxedWorker tests across 5 files. Bun’s Worker threads are still unstable and cause intermittent race conditions and crashes in parallel test runs. Tests will be re-enabled once Bun Workers stabilize.
[2.6.66] - 2026-03-22
Section titled “[2.6.66] - 2026-03-22”- Deduplication not working for JobScheduler (Issue #60) —
upsertJobScheduleraccepted deduplication options in theJobTemplatebut silently discarded them. The cron system (CronJob,CronJobInput,cronScheduler) had no fields foruniqueKeyordedup, so every cron tick created a new job regardless of deduplication settings. Now dedup options are stored in the cron job (including SQLite persistence with schema migration v6) and passed through topushJob()on each tick. When a worker is slow or offline, only one job per dedup key exists instead of unbounded duplicates.
[2.6.65] - 2026-03-22
Section titled “[2.6.65] - 2026-03-22”- MCP operation tracking for Cloud dashboard — Every MCP tool invocation (73 tools) is now tracked and sent to bunqueue.io as part of the cloud snapshot. Each operation records: tool name, queue affected, timestamp, duration, success/failure, and error message. Data is buffered in a bounded ring buffer (max 200 ops, ~40KB) and drained into each snapshot. In embedded mode, the MCP process creates its own CloudAgent to send telemetry. Zero overhead when cloud is not configured. Includes
mcpOperations(raw invocation history) andmcpSummary(aggregated stats with top tools) fields inCloudSnapshot.
[2.6.64] - 2026-03-21
Section titled “[2.6.64] - 2026-03-21”- No-lock ack fails after stall re-queue (data loss) — When a worker with
useLocks=falseprocessed a job that stall detection re-queued, theack()call threw “Job not found” with no recovery path, leaving the job stuck in the queue forever. The existing Issue #33 handler (completeStallRetriedJob) only fired when a lock token was present. Now the handler also fires for tokenless acks when the job was stall-retried (attempts > 0), preventing false completions of freshly-pushed jobs.
[2.6.63] - 2026-03-21
Section titled “[2.6.63] - 2026-03-21”Performance
Section titled “Performance”- WorkerRateLimiter: O(n) → O(1) amortized — Replaced
Array.filter()with head-pointer eviction for sliding window token expiration. Eliminates per-poll array allocation and removesMath.min(...spread)(potential stack overflow on large token arrays). Benchmarked: 10k tokens went from 31µs to ~0µs per call; zero memory allocation per poll cycle. - FlowProducer: parallel sibling creation in TCP mode —
add(),addBulk(),addBulkThen(), andaddTree()now create independent children/jobs concurrently viaPromise.all. TCP benchmark shows 3–6x speedup for flows with 10–20 children (network round-trips overlap instead of serializing).addBulkThen()usesPromise.allSettledfor proper cleanup on partial failure. No impact in embedded mode (pushes are synchronous).addChain()unchanged (sequential by design).
[2.6.62] - 2026-03-21
Section titled “[2.6.62] - 2026-03-21”- E2E webhook tests failing after SSRF validation — Added
validateWebhookUrlsoption toQueueManagerConfigso tests using localhost can disable URL validation.
[2.6.60] - 2026-03-21
Section titled “[2.6.60] - 2026-03-21”- Webhook SSRF prevention in embedded mode —
WebhookManager.add()now validates URLs against SSRF (localhost, private IPs, cloud metadata). Previously only enforced at TCP server layer, leaving embedded SDK unprotected. - Docs: pin Zod v3 for Starlight — Fixed Vercel build crash caused by Zod v4 incompatibility with Starlight 0.31.
Changed
Section titled “Changed”- Extracted
validateWebhookUrlto shared module —src/shared/webhookValidation.tsis now the single source of truth, re-exported fromprotocol.tsfor backward compatibility.
[2.6.49] - 2026-03-20
Section titled “[2.6.49] - 2026-03-20”- Cloud: 20 new remote commands — Full dashboard control via WebSocket:
- Queue:
obliterate,promoteAll,retryCompleted,rateLimit,clearRateLimit,concurrency,clearConcurrency,stallConfig,dlqConfig - Job:
push,priority,discard,delay,updateData,clearLogs - Webhook:
add,remove,set-enabled - Other:
s3:backup
- Queue:
- Shared
deriveStateandmapJobhelpers — Eliminated triplicated state derivation logic in command handlers.
[2.6.48] - 2026-03-20
Section titled “[2.6.48] - 2026-03-20”Changed
Section titled “Changed”- Cloud: auth via HTTP upgrade headers — WebSocket authentication now uses
Authorization,X-Instance-Id, andX-Remote-Commandsheaders on the upgrade request (Bun-specific). Eliminates the JSON handshake message and the 100ms delay workaround. - Cloud: removed client-side ping — Client-side ping (every 10s) was causing false disconnects (code 4000). Keepalive now relies solely on server-side ping (25s) with bunqueue responding pong.
- Cloud: duplicate reconnect guard —
scheduleReconnect()now prevents multiple concurrent reconnect timers. - Cloud:
oncloselogs atinfolevel — Previouslydebug, making reconnect failures invisible in production logs.
[2.6.47] - 2026-03-20
Section titled “[2.6.47] - 2026-03-20”- Programmatic
dataPathfor embedded mode — Queue and Worker acceptdataPathoption to set the SQLite database path without env vars. Resolves conflicts with apps that use their ownDATA_PATH. (#59) BUNQUEUE_DATA_PATH/BQ_DATA_PATHenv vars — New namespaced env vars for data path configuration. Priority:BUNQUEUE_DATA_PATH>BQ_DATA_PATH>DATA_PATH>SQLITE_PATH. Backward compatible.- Cloud: snapshots via WebSocket — Snapshots are now sent over WS when connected (
{ type: "snapshot", ...data }), falling back to HTTP POST only when WS is down.
[2.6.46] - 2026-03-20
Section titled “[2.6.46] - 2026-03-20”- Cloud: resilient WebSocket with ring buffer — Events are buffered (max 1000) when WS is disconnected and flushed after
handshake_ackon reconnect (with 5s fallback timeout). Zero event loss during brief disconnections. - Cloud: client-side ping heartbeat — bunqueue sends
{ type: "ping" }every 10s to the dashboard; if no pong within 5s, closes socket and reconnects. Dead connection detection reduced from ~40s to ~10s. - Cloud: dual-channel failover — When WS is down, buffered events are embedded in the HTTP snapshot (
snapshot.events), so the dashboard stays informed even during prolonged disconnections.
- Cloud: double reconnect race — Pong timeout no longer calls
scheduleReconnect()directly; delegates tooncloseto prevent duplicate sockets. - Cloud: local socket reference — All handlers (pong, handshake, commands) use the local
wsvariable, notthis.ws, preventing replies on stale sockets after reconnect. - Cloud: old socket cleanup — Previous socket is explicitly closed and handlers nulled before creating a new connection.
[2.6.45] - 2026-03-20
Section titled “[2.6.45] - 2026-03-20”- Cloud:
prevanddelayfields in WebSocket events — CloudEvent now forwards all JobEvent fields:prev(previous state on removed/retried) anddelay(ms for delayed jobs).
- Cloud: WebSocket binary frame handling — Ping/pong and command messages now handle both text and binary WebSocket frames (ArrayBuffer/Buffer), preventing silent parse failures behind Cloudflare.
[2.6.44] - 2026-03-20
Section titled “[2.6.44] - 2026-03-20”- Cloud: WebSocket ping/pong heartbeat — Pong responses are now sent regardless of
BUNQUEUE_CLOUD_REMOTE_COMMANDSconfig. Previously, ping messages were silently dropped when remote commands were disabled, causing the dashboard to disconnect the agent every ~60s as a zombie connection.
[2.6.43] - 2026-03-19
Section titled “[2.6.43] - 2026-03-19”- Cloud:
job:listcommand — Paginated job listing per queue with state filtering (queue,state,limit,offset). - Cloud:
job:getcommand — Full job detail with logs and result included. - Cloud:
queue:detailcommand — Queue detail with counts, config, DLQ entries, and job list.
- Cloud: recentJobs now includes completed/failed jobs — Was only querying waiting/active/delayed states.
- Cloud:
job:listtotal count — Now returns actual queue count instead of page length. - Cloud: activeQueues filter — Restored skip-empty-queues optimization that was broken by over-broad filter.
[2.6.42] - 2026-03-19
Section titled “[2.6.42] - 2026-03-19”Performance
Section titled “Performance”- Cloud: two-tier snapshot collection — Light data (stats, throughput, latency, memory) collected every 5s at O(SHARD_COUNT). Heavy data (recentJobs, dlqEntries, topErrors, workerDetails, queueConfigs, webhooks) collected every 30s and cached between refreshes. Heavy collectors skip empty queues (only iterate queues with waiting/active/dlq > 0). Eliminated double
getQueueJobCounts()pass.
- Cloud: totalCompleted/totalFailed per queue — Was sending in-memory BoundedSet count (resets when full). Now sends cumulative counters from
perQueueMetrics(never resets).
[2.6.41] - 2026-03-19
Section titled “[2.6.41] - 2026-03-19”Enhanced
Section titled “Enhanced”- bunqueue Cloud: enterprise-grade telemetry — Snapshot now includes per-queue totals (
totalCompleted/totalFailed), connection stats (TCP/WS/SSE clients), webhook delivery stats, top errors grouped by message, cron execution counts, S3 backup status, rate limit and concurrency config per queue. Addedjob:logsandjob:resultremote commands for on-demand data. Auth errors (401/403) now logged at error level instead of silently buffered.
[2.6.40] - 2026-03-19
Section titled “[2.6.40] - 2026-03-19”Added (Beta)
Section titled “Added (Beta)”- bunqueue Cloud — Remote dashboard telemetry agent. Connect any bunqueue instance to bunqueue.io with just 2 env vars (
BUNQUEUE_CLOUD_URL+BUNQUEUE_CLOUD_API_KEY). Zero overhead when disabled.- Snapshot channel — HTTP POST every 5s with full server state: stats, throughput, latency percentiles, memory, per-queue counts, worker details, cron jobs, storage status, DLQ entries, recent jobs.
- Event channel — Outbound WebSocket for real-time job event forwarding (Failed, Stalled, etc.) with configurable filtering.
- Remote commands (opt-in) — Dashboard can execute commands on the instance via the same WebSocket:
queue:pause,queue:resume,queue:drain,dlq:retry,dlq:purge,job:cancel,job:promote,cron:upsert,cron:delete. RequiresBUNQUEUE_CLOUD_REMOTE_COMMANDS=true. - Multi-instance — Multiple bunqueue instances can connect to the same dashboard with separate instance IDs and names.
- Resilience — Offline snapshot buffer (720 snapshots), circuit breaker, WebSocket auto-reconnect with exponential backoff + jitter, graceful shutdown with final snapshot.
- Security — API key auth, optional HMAC-SHA256 signing, job data redaction, remote commands disabled by default.
- New env vars:
BUNQUEUE_CLOUD_URL,BUNQUEUE_CLOUD_API_KEY,BUNQUEUE_CLOUD_INSTANCE_NAME,BUNQUEUE_CLOUD_INTERVAL_MS,BUNQUEUE_CLOUD_REMOTE_COMMANDS,BUNQUEUE_CLOUD_SIGNING_SECRET,BUNQUEUE_CLOUD_INCLUDE_JOB_DATA,BUNQUEUE_CLOUD_REDACT_FIELDS,BUNQUEUE_CLOUD_EVENTS.
[2.6.39] - 2026-03-18
Section titled “[2.6.39] - 2026-03-18”EventType.Paused/EventType.Resumedmissing from enum — AddedPausedandResumedvariants toEventTypeconst enum, fixing TypeScript compilation errors inqueueManager.tsandclient/events.ts.UnrecoverableError/DelayedErrornot exported — Addedsrc/client/errors.tswith BullMQ-compatible error classes (UnrecoverableErrorto skip retries,DelayedErrorto re-delay jobs) and exported them frombunqueue/client.- Webhook mapping for pause/resume events —
eventsManager.tsnow handlesPausedandResumedevent types in the webhook switch.
- Issue #53 test — Regression test for worker
logevent firing.
[2.6.38] - 2026-03-18
Section titled “[2.6.38] - 2026-03-18”- Worker registration + heartbeat system — Worker SDK now auto-registers with the server on
run(), sends periodic heartbeats withactiveJobs/processed/failedstats, and unregisters onclose(). The server trackshostname,pid,uptimeper worker.GET /workersandListWorkersTCP command return full worker details including aggregate stats. Dashboard receives real-time events (worker:connected,worker:heartbeat,worker:disconnected). RegisterWorkerCommandextended — AcceptsworkerId,hostname,pid,startedAtfrom client. Re-registration with sameworkerIdupdates instead of duplicating.HeartbeatCommandextended — AcceptsactiveJobs,processed,failedto sync client-side stats to server.onOutcomecallback in processor — Tracks completed/failed counts without adding event listeners.
Removed
Section titled “Removed”- Flaky embedded tests (sandboxed-workers, cron-event-driven, query-operations)
[2.6.37] - 2026-03-17
Section titled “[2.6.37] - 2026-03-17”getJobCountsnow returnsdelayedandpausedcounts — Matches BullMQ’sgetJobCounts()return type. Both embedded and TCP modes includedelayed(jobs with futurerunAt) andpaused(waiting jobs count when queue is paused). (#56)getJobssupports multiple statuses — Acceptsstring | string[]for thestateparameter, matching BullMQ’sgetJobs(types?: JobType | JobType[])interface. Works in embedded, TCP, and HTTP (?state=waiting&state=delayed). (#55)GET /queues/summaryendpoint — Returns all queues with name, paused status, and job counts in a single HTTP call, replacing N+1 round-trips.
Removed
Section titled “Removed”- Flaky TCP integration tests (sandboxed-worker, monitoring)
[2.6.36] - 2026-03-17
Section titled “[2.6.36] - 2026-03-17”/queues/:queue/jobs/listperformance — Endpoint was taking 300-450ms even withlimit=2because it scanned the entire jobIndex (O(N) iterations + O(N) individual SQLite lookups) then sorted all results. Now delegates to a single indexed SQLite query withLIMIT/OFFSET, reducing response time to <5ms.
[2.6.35] - 2026-03-16
Section titled “[2.6.35] - 2026-03-16”Changed
Section titled “Changed”- Removed flaky SandboxedWorker flow failure test
[2.6.34] - 2026-03-16
Section titled “[2.6.34] - 2026-03-16”- QueueEvents failed events —
failedReasonnow correctly reads fromevent.errorinstead ofevent.data, jobdatais included in failed broadcasts, and error emission includes event context. (#54) — thanks @simontong
Changed
Section titled “Changed”- CI — Disabled TCP and Embedded integration tests in GitHub Actions pipeline
- Removed flaky SandboxedWorker tests
[2.6.33] - 2026-03-16
Section titled “[2.6.33] - 2026-03-16”- Worker
logevent —worker.on('log', (job, message) => ...)now works with full TypeScript autocomplete. Thelogevent is emitted whenjob.log()is called inside the processor, matching SandboxedWorker behavior. (#53)
[2.6.32] - 2026-03-16
Section titled “[2.6.32] - 2026-03-16”- 13 new WebSocket/SSE events —
job:expired,flow:completed,flow:failed,queue:idle,queue:threshold,worker:overloaded,worker:error,cron:skipped,storage:size-warning,server:memory-warning(+flow:*wildcard). Total event types: 86. - Monitoring checks — Periodic threshold monitoring runs on cleanup interval (10s). Configurable via env vars:
QUEUE_IDLE_THRESHOLD_MS,QUEUE_SIZE_THRESHOLD,MEMORY_WARNING_MB,STORAGE_WARNING_MB,WORKER_OVERLOAD_THRESHOLD_MS. - Cron overlap detection — Crons skip execution if the previous instance fired within 80% of the repeat interval, emitting
cron:skippedinstead. - Flow lifecycle events —
flow:completedwhen all children of a parent job finish,flow:failedwhen a child permanently fails (moves to DLQ).
Changed
Section titled “Changed”- SandboxedWorker docs — Clearly marked as experimental across all documentation pages (worker, migration, CPU-intensive, stall-detection, troubleshooting). Production recommendation to use standard
Workerinstead.
[2.6.31] - 2026-03-16
Section titled “[2.6.31] - 2026-03-16”- SandboxedWorker
autoStartoption — Automatically restart the worker pool when new jobs arrive after idle shutdown. SetautoStart: truewithidleTimeoutto get workers that sleep when idle and wake up when needed. Configurable poll interval viaautoStartPollMs(default: 5000ms). Closes #51.
[2.6.30] - 2026-03-16
Section titled “[2.6.30] - 2026-03-16”- Full WebSocket/SSE event coverage — 73 unique event types now emitted across all transports. Every state change, operation, and lifecycle event is observable via WebSocket pub/sub and SSE.
- New event categories:
job:timeout,job:lock-expired,job:deduplicated,job:waiting-children,job:dependencies-resolved,job:stalled(dashboard),job:moved-to-delayed - Backup events:
storage:backup-started,storage:backup-completed,storage:backup-failed - Connection tracking:
client:connected,client:disconnected,auth:failed - Batch events:
batch:pushed,batch:pulled - DLQ maintenance events:
dlq:auto-retried,dlq:expired - Cron lifecycle:
cron:fired,cron:missed,cron:updated(distinguish create vs update) - Worker events:
worker:heartbeat,worker:idle,worker:removed-stale - Webhook events:
webhook:fired,webhook:failed,webhook:enabled,webhook:disabled - Queue lifecycle:
queue:created,queue:removed(on obliterate and cleanup) - Rate/concurrency:
ratelimit:hit,ratelimit:rejected,concurrency:rejected - Server lifecycle:
server:started,server:shutdown,server:recovered - Cleanup events:
cleanup:orphans-removed,cleanup:stale-deps-removed - Memory:
memory:compacted
[2.6.29] - 2026-03-16
Section titled “[2.6.29] - 2026-03-16”- TCP integration tests — 4 new test suites: backoff strategies, job move methods, parent failure options, worker advanced methods. TCP test coverage now at 56 suites.
[2.6.28] - 2026-03-15
Section titled “[2.6.28] - 2026-03-15”getChildrenValuesempty in TCP mode — Fixed response envelope unwrap in worker processor (response.data.valuesinstead ofresponse.values). FixedchildrenIds/parentIdnot passed through TCP protocol in flow jobs. (#49, PR by @simontong)
[2.6.27] - 2026-03-15
Section titled “[2.6.27] - 2026-03-15”getJobreturns null for failed/DLQ jobs — In embedded mode (no SQLite storage),getJob()andgetJobByCustomId()now correctly query the shard DLQ instead of returning null. (#50)getChildrenValueswired in worker — Worker job processor now correctly passes thegetChildrenValuescallback.
- WebSocket/SSE integration tests — 88 new integration tests covering WebSocket and SSE event streaming.
[2.6.26] - 2026-03-15
Section titled “[2.6.26] - 2026-03-15”- Enterprise-grade SSE — Event IDs for client-side deduplication, Last-Event-ID resume with ring buffer (1000 events), heartbeat keepalive (30s), retry field (3s auto-reconnect), connection limit (1000 max with 503 rejection).
- Enterprise-grade WebSocket — Backpressure detection via getBufferedAmount() (1MB threshold), dead client cleanup in emit/broadcast, connection limit (1000 max), dropped message counter for observability.
- Worker options — Documented 8 missing options: limiter, lockDuration, maxStalledCount, skipStalledCheck, skipLockRenewal, drainDelay, removeOnComplete, removeOnFail.
- FlowProducer BullMQ v5 API — Documented add(), addBulk(), getFlow() methods with FlowJob/JobNode interfaces.
- Lifecycle functions — Documented shutdownManager(), closeSharedTcpClient(), closeAllSharedPools().
- Environment variables — Added BUNQUEUE_MODE, BUNQUEUE_HOST, BUNQUEUE_PORT to env-vars reference.
[2.6.25] - 2026-03-14
Section titled “[2.6.25] - 2026-03-14”GET /queues/:q/workerscrash — Fixed crash when some workers were registered without aqueuesfield (undefined/null). Now safely skips workers with missing queues and defaults to[]on creation.
[2.6.24] - 2026-03-14
Section titled “[2.6.24] - 2026-03-14”- Per-queue completed count —
GET /queues/:q/countscompletedfield now counts only jobs completed in the requested queue instead of returning the global total across all queues. - DLQ endpoint returns full metadata —
GET /queues/:q/dlqnow returnsDlqEntry[]withenteredAt,reason,error,retryCount,lastRetryAt,nextRetryAt,expiresAtinstead of rawJob[]. - Worker registration accepts
queue(singular) —POST /workersnow accepts bothqueue(string) andqueues(array), plusworkerIdas alias forname.
- Per-queue
totalCompleted/totalFailedcounters —GET /queues/:q/countsnow includes cumulative per-queue counters for completed and failed jobs. GET /queues/:q/workersendpoint — New endpoint to list workers registered for a specific queue.GET /queues/:q/dlq/statsendpoint — Server-side DLQ stats aggregation:total,byReason,pendingRetry,oldestEntry.- Worker
concurrency,status,currentJobfields —GET /workersandPOST /workersresponses now includeconcurrency, computedstatus(active/stale), andcurrentJob. - Throughput rates in
GET /stats— AddedpushPerSec,pullPerSec,completePerSec,failPerSecfrom the built-in throughput tracker.
[2.6.23] - 2026-03-14
Section titled “[2.6.23] - 2026-03-14”- Dashboard beta demo — Added demo video and beta CTA to README and docs introduction page.
[2.6.22] - 2026-03-14
Section titled “[2.6.22] - 2026-03-14”- dlq:added WebSocket event — Now emitted when a job moves to DLQ after max attempts exceeded. Previously this event was defined but never fired.
- job:progress WebSocket event — Progress value now included in event payload. Previously
progresswasundefinedbecause the broadcast didn’t set the top-level field.
- Comprehensive WebSocket pub/sub integration test — 47 assertions covering all 9 event categories (job lifecycle, queue, DLQ, cron, worker, rate-limit, concurrency, webhook, config, system periodic) plus protocol tests (subscribe, unsubscribe, wildcard, invalid patterns, Ping over WS).
[2.6.21] - 2026-03-14
Section titled “[2.6.21] - 2026-03-14”Performance
Section titled “Performance”- Batch push notifyBatch() — Batch push now wakes all waiting workers correctly via
notifyBatch(N)instead of a singlenotify()call. Each waiter is woken up individually, fixing a bug where only 1 of N workers received jobs immediately. - Pre-compiled HTTP route regexes — All 40+ regex patterns in HTTP route files are now compiled once at module load instead of per-request (~100µs/request savings).
Security
Section titled “Security”- constantTimeEqual timing fix — Removed early return on length mismatch that leaked token length via timing side-channel.
- Batch PUSHB data validation — Individual job data size is now validated in batch push (was only checked in single PUSH), preventing 10MB limit bypass.
- Dashboard queue name validation —
GET /dashboard/queues/:queuenow validates queue names like all other endpoints. - Error message sanitization — SQLite/database error messages are no longer leaked to clients in TCP and HTTP error responses.
- Silent error swallowing — Replaced 7 empty
.catch(() => {})blocks with proper error logging in addBatcher flush, sandboxed worker stop/kill/restart/heartbeat paths.
[2.6.20] - 2026-03-14
Section titled “[2.6.20] - 2026-03-14”- Centralized HTTP JSON body parsing — Replaced per-file
parseBody()with sharedparseJsonBody()that returns proper 400 responses for invalid JSON instead of silently falling back to{}. - Dashboard pagination — Added
limitandoffsetquery parameters toGET /dashboard/queues. Workers and crons lists capped at 100 entries withtruncatedflag. - ESLint complexity reduction — Extracted job push/pull/bulk operations into
routeJobOps()helper to keeprouteQueueRoutesunder the 45-branch complexity limit.
[2.6.19] - 2026-03-14
Section titled “[2.6.19] - 2026-03-14”- WebSocket idle timeout (ping/pong) — Set
idleTimeout: 120on the WebSocket server. Bun automatically sends ping frames and closes connections that don’t respond with pong within 120 seconds. Dead clients (crash, network drop, kill -9) are now detected and cleaned up automatically instead of leaking in the clients Map forever. - WebSocket max payload limit — Set
maxPayloadLength: 1MB. Prevents memory exhaustion from oversized messages.
[2.6.18] - 2026-03-14
Section titled “[2.6.18] - 2026-03-14”- WebSocket pub/sub system with 50 event types — Clients subscribe to specific events via
{ cmd: "Subscribe", events: ["job:*", "stats:snapshot"] }and receive only matching data. Supports wildcard patterns (*,job:*,queue:*,worker:*,dlq:*,cron:*, etc.). Legacy clients (no Subscribe) continue receiving all events in the old format. - Periodic dashboard broadcasts —
stats:snapshotevery 5s (global stats, per-queue counts, throughput, workers),health:statusevery 10s (uptime, memory, connections),storage:statusevery 30s (collection sizes, disk health). queue:countsevent — Fired on every job state change with real-time counts for the affected queue. Eliminates the N+1 polling problem for dashboards (20 queues = 0 HTTP calls instead of 200+/min).- Dashboard event hooks — 30+ operations now emit real-time events:
job:promoted,job:discarded,job:priority-changed,job:data-updated,job:delay-changed,queue:paused/resumed/drained/cleaned/obliterated,dlq:retried/purged,cron:created/deleted,webhook:added/removed,ratelimit:set/cleared,concurrency:set/cleared,config:stall-changed/dlq-changed,worker:connected/disconnected.
Changed
Section titled “Changed”- HTTP API docs rewritten — 2,048 lines of enterprise-grade documentation with deep explanations of job lifecycle, retry behavior, stall detection, every endpoint with curl examples, full request/response specs, all 50 pub/sub events with payload schemas.
[2.6.17] - 2026-03-14
Section titled “[2.6.17] - 2026-03-14”- Memory leak in HTTP client tracking — Every HTTP PULL+ACK cycle created an orphaned entry in the
clientJobsMap that was never cleaned up. Over time this grew unbounded. Fix: HTTP requests no longer setclientId(stateless). Job ownership tracking only applies to persistent connections (TCP/WebSocket). Orphaned HTTP jobs are handled by stall detection.
[2.6.16] - 2026-03-14
Section titled “[2.6.16] - 2026-03-14”- PUSH
maxAttemptssilently ignored via HTTP — The HTTP endpoint mappedattemptsinstead ofmaxAttempts, causing retry configuration to be discarded. Now correctly maps tomaxAttempts(also acceptsattemptsfor backwards compatibility). - GetJobs pagination broken via HTTP — The HTTP endpoint sent
start/endinstead ofoffset/limit, causing query parameters to be silently ignored. Pagination now works correctly. - Batch HTTP endpoints unreachable —
/jobs/ack-batch,/jobs/extend-locks, and/jobs/heartbeat-batchwere intercepted by the generic/jobs/:idpattern. Fixed by matching exact batch paths before the wildcard pattern.
[2.6.15] - 2026-03-14
Section titled “[2.6.15] - 2026-03-14”- Full HTTP REST API parity with TCP protocol — All 76 TCP commands are now accessible via HTTP endpoints. Previously only 17 endpoints were available. New endpoints include:
- Job management: promote, update data, get state, get result, get/update progress, change priority, discard to DLQ, move to delayed, change delay, wait for completion, get children values
- Job logs: add, get, and clear structured logs per job
- Job locking: heartbeat, extend lock, batch heartbeat, batch extend locks
- Batch operations: bulk push (
PUSHB), batch pull (PULLB), batch acknowledge (ACKB) - Queue control: list queues, list jobs by state, job counts, priority counts, pause/resume, drain, obliterate, clean with grace period, promote all delayed, retry completed
- DLQ: list DLQ jobs, retry (single or all), purge
- Rate limiting & concurrency: set/clear per-queue rate limits and concurrency limits
- Queue configuration: get/set stall detection config, get/set DLQ config
- Cron jobs: full CRUD (list, add, get, delete)
- Webhooks: full CRUD (list, add, remove, enable/disable)
- Workers: list, register, unregister, worker heartbeat
- Monitoring: ping, storage status
- HTTP route architecture — Routes split into 4 files (
httpRouteJobs.ts,httpRouteQueues.ts,httpRouteQueueConfig.ts,httpRouteResources.ts) for maintainability. - HTTP API documentation rewritten — Enterprise-grade docs with curl examples, full request/response specs, parameter tables, and error cases for every endpoint (1,640 lines).
[2.6.14] - 2026-03-14
Section titled “[2.6.14] - 2026-03-14”- CLI double execution — Every CLI command ran twice due to
main()being called both on module load and on import. Addedimport.meta.mainguard. - CLI ACK/FAIL rejected UUID job IDs —
parseBigIntArg()only accepted numeric IDs (/^\d+$/) but all job IDs are UUIDs. Now accepts any non-empty string ID. - CLI ACK/FAIL always failed — Each CLI command opens a new TCP connection. When the PULL connection closed, jobs were auto-released back to waiting. ACK on a new connection found the job no longer in processing. Added
detachflag to PULL command for CLI usage. job getshowedState: unknown— GetJob response didn’t include job state. Now includes state fromgetJobState().queue jobsstate column showed-— GetJobs handler didn’t include state per job. Now injects state for each returned job.bunqueue -p <port>(withoutstart) ignored port flag — Direct mode ignored all CLI flags. Now routes to CLI parser when flags are present.- Worker/webhook/cron/logs/metrics list showed
OK— Server wraps responses in{data: {...}}but CLI formatter only checked top-level keys. Addedunwrap()helper. - Cron list showed
OK— Server returnscronskey but formatter checked forcronJobs. - Worker/webhook list showed stats instead of entries —
statscheck ran beforeworkers/webhooksin formatter priority order. - Worker register showed queue list — Response
queuesfield triggered queue list formatter. - DLQ list format broken — Formatter expected
jobIdfield but server returnsid. - Metrics showed
OK— Prometheus metrics nested indata.metrics.
[2.6.9] - 2026-03-10
Section titled “[2.6.9] - 2026-03-10”- SandboxedWorker graceful stop —
stop()now drains active jobs before terminating worker threads, preventing data loss when stopping during job processing. Addedforceparameter for immediate termination when needed. (#39)
[2.6.7] - 2026-03-08
Section titled “[2.6.7] - 2026-03-08”- CronScheduler stale heap bug — When a cron job was removed,
scheduleNext()encountered the stale heap entry and returned early without setting any timer, preventing all subsequent crons from firing. Now properly pops stale entries from the min-heap until a valid one is found. (#33) - Graceful shutdown burst load — Fixed
worker.close(true)causing unhandled AckBatcher errors when jobs were still completing during burst load scenarios. Changed to graceful close with proper drain.
- 53 new test suites — Comprehensive test coverage across embedded and TCP modes:
- Batch 1–3 (19 embedded + 18 TCP): stress, ETL, retry, cron, queue group, shutdown, backpressure, priorities, lifecycle, data integrity, deduplication, timeouts, flows, removal, pause/resume, worker scaling, cancellation, DLQ patterns, bulk ops
- Coverage gap tests (16 embedded): auto-batching, webhook delivery, durable jobs, rate limiting, lock race conditions, flow + stall detection, cron timezone/DST, LIFO queue, DLQ selective retry, S3 backup concurrent, webhook SSRF, MCP edge cases, CLI error formatting, flow deduplication, sandboxed worker + flow, queue group + flow
- Total test count increased from ~4,000 to 4,903
- Removed BullMQ-only WorkerOptions from API types (lockDuration, maxStalledCount, etc.)
- Added auto-batching documentation to Queue guide
- Added connection pool sizing note to Worker guide
- Fixed CLI help: removed non-existent socket options, fake interactive prompts
Performance
Section titled “Performance”- CronScheduler
scheduleNext()now handles stale entries in O(k) amortized instead of blocking indefinitely
[2.6.6] - 2026-03-07
Section titled “[2.6.6] - 2026-03-07”- Parent-child flow race condition — Resolved race where concurrent ack/fail operations on parent-child flows could cause inconsistent state. (#31)
- Embedded Worker heartbeats — Fixed embedded Worker heartbeat mechanism not properly keeping jobs alive during long processing. (#32)
[2.6.5] - 2026-03-06
Section titled “[2.6.5] - 2026-03-06”- SandboxedWorker
logevent not emitted — The processor’sjob.log()method stored logs viaaddLog()but the SandboxedWorker never emitted a'log'event. Listeners registered with.on('log', ...)were never called. Now properly emits(job, message)on each log call. (#29) - SandboxedWorker embedded heartbeats missing — In embedded mode,
sendHeartbeatwas a no-op andheartbeatIntervaldefaulted to 0 (timer never started). Long-running jobs withoutprogress()calls were detected as stalled and moved to DLQ despite still running. NowsendHeartbeatcallsmanager.jobHeartbeat()and defaults to 5000ms. (#30)
- Typed event overloads for
'log'event on SandboxedWorker (on/once) - Regression tests for both issues (
test/issue29-sandboxed-log.test.ts,test/issue30-dlq-stall.test.ts)
- Updated SandboxedWorker processor example with
log(),fail(), andparentIdfields - Fixed
heartbeatIntervaldefault from0to5000in embedded mode docs - Added
logevent to SandboxedWorker Event Reference (8 events total) - Added SandboxedWorker section to Stall Detection guide
- Updated SandboxedWorkerOptions type with
heartbeatIntervalandconnectionfields
[2.6.4] - 2026-03-05
Section titled “[2.6.4] - 2026-03-05”- Lock token race condition — Resolved race where concurrent ack/fail operations could use an expired lock token, causing “Invalid or expired lock token” errors under high concurrency. (#28)
- SandboxedWorker generics —
SandboxedWorker<T>now supports a generic type parameter for typed events (e.g.,worker.on('completed', (job: Job<MyData>) => ...)) - Processor API improvements — Processor files now receive
log(),fail(), andparentIdon the job object alongsideprogress() - Typed
on()/once()overloads for all SandboxedWorker events (#25)
[2.6.2] - 2026-03-03
Section titled “[2.6.2] - 2026-03-03”job.namealways'default'for scheduled jobs — When jobs were created viaQueue#upsertJobScheduler, thenamefromjobTemplatewas not embedded in the cron job data. The worker fell back to'default'. Now embeds the name in data, matchingQueue.add()behavior. (Discussion #23)
- Regression test for scheduler job name passthrough (
test/bug-23-scheduler-job-name.test.ts)
- Added SandboxedWorker Options Reference table
- Added SandboxedWorker Event Reference table with types
- Clarified which events are not available on SandboxedWorker (
stalled,drained,cancelled) - Added tip about increasing
maxMemoryfor large file processing - Fixed missing
awaitonworker.start()calls - Improved Worker vs SandboxedWorker comparison table
[2.6.1] - 2026-03-03
Section titled “[2.6.1] - 2026-03-03”Queue#upsertJobSchedulerignoring timezone — TheRepeatOptsinterface was missing thetimezonefield, causing a TypeScript error when setting it. Additionally, embedded mode hardcodedtimezone: 'UTC'and TCP mode did not forward timezone to the server. Now properly accepts and passes through IANA timezone strings (e.g.,"Europe/Rome","America/New_York"). (#22)
- Regression test for scheduler timezone passthrough (
test/bug-22-scheduler-timezone.test.ts)
[2.6.0] - 2026-03-03
Section titled “[2.6.0] - 2026-03-03”- 8 new TCP command handlers —
ClearLogs,ExtendLock,ExtendLocks,ChangeDelay,SetWebhookEnabled,CompactMemory,MoveToWait,PromoteJobs. These commands were already sent by the client SDK and MCP adapter but had no server-side handler, causing silentUnknown commanderrors in TCP mode. All 8 are now fully functional. updateJobData/updateJobChildrenIdspersistence methods added toSqliteStoragefor parent-child relationship durability.- 20 new regression tests covering all fixes in this release.
- Expired lock requeue not updating stats — When a job’s lock expired and was requeued for retry,
requeueExpiredJobinlockManager.tsdid not callshard.incrementQueued()orshard.notify(). This causedgetStats()to report 0 waiting jobs and workers in long-poll mode to not wake up for the requeued job. updateJobParentnot persisting to SQLite —childrenIdsand__parentIdmutations were only applied in memory. After a server restart, all parent-child flow relationships were lost. Now properly persisted via dedicated SQLite update methods.getJobreturning null for completed jobs without storage — In no-SQLite mode (embedded without persistence),getJob()returnednullfor completed/DLQ jobs because it only checkedctx.storage?.getJob(). Now falls back toctx.completedJobsDatain-memory map.- MCP
UnregisterWorkerfield mismatch — MCP adapter sent{ cmd: 'UnregisterWorker', id }but the server expected{ workerId }. Worker unregistration via MCP in TCP mode always failed silently. Fixed to send the correct field name. JobHeartbeatignoringdurationfield — When the MCP adapter sent aJobHeartbeatwith a customduration, the handler ignored it and renewed the lock with the default TTL. Now properly extends the lock with the requested duration viarenewJobLock().
[2.5.8] - 2026-03-02
Section titled “[2.5.8] - 2026-03-02”- Repeat job updateData —
updateData()now propagates to the next repeat execution. Previously, callingupdateData()on a completed repeated job silently failed because the job was removed from the index. A repeat chain now tracks successor job IDs so updates reach the next scheduled execution. (#16) - Worker event IntelliSense — Worker now has typed
on()andonce()overloads for all 10 events (ready,active,completed,failed,progress,stalled,drained,error,cancelled,closed), providing full TypeScript autocomplete. (#15)
FlowJobDatatype — New exported interface for flow-injected fields (__flowParentId,__flowParentIds,__parentId,__parentQueue,__childrenIds).Processor<T, R>now intersectsTwithFlowJobDatafor automatic IntelliSense in Worker callbacks. (#18)- CLI env var auth — CLI now reads
BQ_TOKEN/BUNQUEUE_TOKENenvironment variables as fallback when--tokenis not provided. Priority:--tokenflag >BQ_TOKEN>BUNQUEUE_TOKEN. (#13)
- Updated Worker guide with typed event reference table
- Updated Flow guide with
FlowJobDatatype documentation - Updated Queue guide with
updateData()for repeatable jobs - Updated CLI guide and env vars guide with
BQ_TOKEN/BUNQUEUE_TOKEN
[2.5.7] - 2026-03-01
Section titled “[2.5.7] - 2026-03-01”- SandboxedWorker TCP mode — SandboxedWorker now supports connecting to a remote bunqueue server via TCP, enabling crash-isolated job processing in server deployments (systemd, Docker). Pass
connectionoption to enable it. - SandboxedWorker EventEmitter — SandboxedWorker now extends EventEmitter with full event support:
ready,active,completed,failed,progress,error,closed(matching regular Worker API). - QueueOps adapter (
src/client/sandboxed/queueOps.ts) — unified interface for embedded and TCP queue operations, keeping SandboxedWorker code clean and dual-mode. - TCP heartbeat for SandboxedWorker — automatic lock renewal via
JobHeartbeatcommands for active jobs in TCP mode (configurable viaheartbeatInterval). - TCP integration test for SandboxedWorker (
scripts/tcp/test-sandboxed-worker.ts) - 8 new unit tests for SandboxedWorker events and TCP constructor
- Updated Worker guide with SandboxedWorker TCP mode section and events documentation
- Updated CPU-Intensive Workers guide with SandboxedWorker TCP example
[2.5.6] - 2026-02-27
Section titled “[2.5.6] - 2026-02-27”- 3 new TCP commands for MCP protocol optimization (73 tools total):
CronGet— fetch a single cron job by name instead of listing all and filtering client-sideGetChildrenValues— batch-fetch children return values in a single command instead of N+1 queriesStorageStatus— return real disk/storage health from the server instead of hardcodeddiskFull: false
- 9 new tests for the 3 TCP commands (
test/tcp-new-commands.test.ts)
- MCP TCP
getCron(name)— now uses dedicatedCronGetcommand instead of fetching all crons and filtering client-side - MCP TCP
getChildrenValues(id)— now uses dedicatedGetChildrenValuescommand instead of 1 + 2N queries (GetJob parent + GetResult/GetJob per child) - MCP TCP
getStorageStatus()— now uses dedicatedStorageStatuscommand instead of returning hardcoded{ diskFull: false }
[2.5.5] - 2026-02-26
Section titled “[2.5.5] - 2026-02-26”- TCP client auth state corruption —
TcpClient.doConnect()setconnected = truebeforeauthenticate()completed. If authentication failed, the client remained in a corrupted state (connected = truewith no valid session), causing subsequent operations to silently fail. Connection state is now set only after successful authentication, with proper cleanup on failure.
- SEO overhaul — keyword-rich titles, optimized descriptions, AI keywords, sitemap priorities
[2.5.4] - 2026-02-24
Section titled “[2.5.4] - 2026-02-24”- 4 MCP Flow Tools — job workflow orchestration via MCP (70 tools total):
bunqueue_add_flow— create flow trees with parent/children dependencies (BullMQ v5 compatible)bunqueue_add_flow_chain— sequential pipelines: A → B → Cbunqueue_add_flow_bulk_then— fan-out/fan-in: parallel jobs → final mergebunqueue_get_flow— retrieve flow trees with full dependency graph
[2.5.3] - 2026-02-24
Section titled “[2.5.3] - 2026-02-24”- 3 MCP Prompts for AI agents — pre-built diagnostic templates:
bunqueue_health_report— comprehensive server health report with severity levelsbunqueue_debug_queue— deep diagnostic of a specific queuebunqueue_incident_response— step-by-step triage playbook for “jobs not processing”
- MCP graceful shutdown —
server.close()now awaited before exit - MCP
getStorageStatus()TCP — verifies server reachability instead of returning hardcoded response - MCP
getChildrenValues()TCP — parallel fetch withPromise.allinstead of sequential N+1 - MCP resource error format — includes
isError: trueconsistent with tool errors - MCP pool size — configurable via
BUNQUEUE_POOL_SIZEenv var (default: 2)
[2.5.2] - 2026-02-24
Section titled “[2.5.2] - 2026-02-24”- TCP deduplication —
jobIddeduplication now works correctly in TCP mode. The auto-batcher was sendingjobIdinstead ofcustomIdin PUSHB commands, causing the server to skip deduplication for all batched operations (#10) - CLI
--hostand-pflags —bunqueue start --host 127.0.0.1 -p 6666now correctly binds to the specified host and port. Previously,parseGlobalOptions()consumed these flags as global options, removing them before the server could use them (#9) - Docker healthcheck — Changed healthcheck URL from
localhostto127.0.0.1to avoid IPv6 resolution issues in Alpine containers (#7) - TCP ping health check — Fixed ping response parsing from
response.pongtoresponse.data.pongmatching the actual server response structure (#5)
- Tests for PUSHB deduplication (same-batch and cross-batch)
- Tests for CLI server argument re-injection (
--host,-p,--host=VALUE,--port=VALUE) - Test for ping response structure validation
- E2E TCP deduplication test script (
scripts/tcp/test-dedup-tcp.ts)
- Updated deployment guide healthcheck example (
localhost→127.0.0.1) - Clarified that
jobIddeduplication works in both embedded and TCP modes - Added
--hostflag example to CLI start command reference
[2.5.1] - 2026-02-23
Section titled “[2.5.1] - 2026-02-23”- MCP error handling — All 66 tool handlers now wrapped with
withErrorHandlerthat catches backend exceptions and returns structured{ error: "message" }responses withisError: trueinstead of raw stack traces - MCP TCP connection —
createBackend()is now async and properly awaits TCP connection. Previously used fire-and-forget (void backend.connect()) which silently swallowed connection failures - MCP not-found responses —
bunqueue_get_job,bunqueue_get_job_by_custom_id,bunqueue_get_progress, andbunqueue_get_cronnow returnisError: truewhen resource is not found
src/mcp/tools/withErrorHandler.ts— Reusable error boundary for MCP tool handlers- 39 new MCP backend tests (75 total) — webhooks, worker management, monitoring, batch operations, heartbeat, progress, full lifecycle
[2.5.0] - 2026-02-21
Section titled “[2.5.0] - 2026-02-21”Changed
Section titled “Changed”- MCP server rewrite — Upgraded from custom implementation to official
@modelcontextprotocol/sdk(v1.26.0) for full protocol compliance - 66 tools organized across 10 domain-specific files (jobTools, jobMgmtTools, consumptionTools, queueTools, dlqTools, cronTools, rateLimitTools, webhookTools, workerMgmtTools, monitoringTools)
- 5 MCP resources for read-only AI context (stats, queues, crons, workers, webhooks)
- Dual-mode backend — Embedded (direct SQLite) and TCP (remote server) via
McpBackendadapter interface
- TCP mode for MCP server — connect to remote bunqueue server via
BUNQUEUE_MODE=tcp - AI agent documentation and use cases
- MCP configuration guides for Claude Desktop, Claude Code, Cursor, and Windsurf
[2.4.8] - 2026-02-16
Section titled “[2.4.8] - 2026-02-16”getJobs({ state: 'completed' })now correctly returns completed jobs instead of empty results
[2.4.7] - 2026-02-14
Section titled “[2.4.7] - 2026-02-14”Performance
Section titled “Performance”-
Event-driven cron scheduler - Replaced 1s
setIntervalpolling with precisesetTimeoutthat wakes exactly when the next cron is due. Zero wasted ticks between executions:Scenario Before After 1 cron every 5min 300 ticks/5min (299 wasted) 1 tick/5min 0 crons registered 1 tick/sec (all wasted) 0 ticks Cron in 3 hours 10,800 wasted ticks 1 tick at exact time -
A 60s
setIntervalsafety fallback catches edge cases (timer drift, missed events). Zero functional changes, zero API changes.
scripts/embedded/test-cron-event-driven.ts- Operational test verifying cron timer precision
[2.4.6] - 2026-02-14
Section titled “[2.4.6] - 2026-02-14”Performance
Section titled “Performance”-
Event-driven dependency resolution - Replaced 100ms
setIntervalpolling with microtask-coalesced flush triggered on job completion. Dependency chain latency drops from hundreds of milliseconds to microseconds:Scenario Before (P50) After (P50) Speedup Single dep (A→B) 100.05ms 12.5µs ~8,000x Chain (4 levels) 300.43ms 28.2µs ~10,700x Fan-out (1→5) 100.11ms 31.0µs ~3,200x -
The previous 100ms interval is now a 30s safety fallback. Zero functional changes, zero API changes.
-
Bonus: less CPU at idle (no more 10 calls/sec to
processPendingDependencieswhen queue is empty).
src/benchmark/dependency-latency.bench.ts- Benchmark for dependency chain resolution latencysrc/application/taskErrorTracking.ts- Extracted error tracking for reuse across modules
[2.4.5] - 2026-02-14
Section titled “[2.4.5] - 2026-02-14”- Backoff jitter -
calculateBackoff()now applies jitter to prevent thundering herd when many jobs retry simultaneously. Exponential backoff uses ±50% jitter, fixed backoff uses ±20% jitter around the configured delay. - Backoff max cap - Retry delays are now capped at 1 hour (
DEFAULT_MAX_BACKOFF = 3,600,000ms) by default. Previously, attempt 20 with 1000ms base produced ~12 day delays. Configurable viaBackoffConfig.maxDelay. - Recovery backoff bypass - Startup recovery now uses
calculateBackoff(job)instead of an inline exponential formula, correctly respectingbackoffConfig(e.g.,{ type: 'fixed', delay: 5000 }was ignored during recovery).
[2.4.3] - 2026-02-14
Section titled “[2.4.3] - 2026-02-14”- Batch push now wakes all waiting workers -
pushJobBatchpreviously callednotify()only once, causing only 1 of N waiting workers to wake up immediately. Others had to wait for their poll timeout (up to 30s with long-poll). Now each inserted job triggers a separate notification, waking all idle workers instantly. - Pending notifications counter -
WaiterManager.pendingNotificationwas a boolean flag, silently losing notifications when multiple pushes occurred with no waiting workers. Changed to an integer counter (pendingNotifications) so each notification is tracked and consumed individually.
[2.4.2] - 2026-02-13
Section titled “[2.4.2] - 2026-02-13”- CPU-Intensive Workers guide - New dedicated docs page for handling CPU-heavy jobs over TCP
- Explains the ping health check failure chain that causes job loss after ~90s of CPU load
- Connection tuning:
pingInterval: 0,commandTimeout: 60000 - Non-blocking CPU patterns with
await Bun.sleep(0)yield - Default timeouts reference table
- SandboxedWorker as alternative for truly CPU-bound work
- CPU stress test script -
scripts/stress-cpu-intensive.ts(500 jobs, 5 CPU task types, concurrency 3)
[2.4.1] - 2026-02-12
Section titled “[2.4.1] - 2026-02-12”Changed
Section titled “Changed”- Codebase refactoring - Split 6 large files exceeding 300-line limit into smaller focused modules
src/shared/lru.ts(643 lines) → barrel re-export + 5 modules:lruMap.ts,lruSet.ts,boundedSet.ts,boundedMap.ts,ttlMap.tssrc/client/jobConversion.ts(499 lines) → 269 lines +jobConversionTypes.ts,jobConversionHelpers.tssrc/domain/queue/shard.ts(554 lines) → 484 lines +waiterManager.ts,shardCounters.tssrc/application/queueManager.ts(820 lines) → 774 lines (movedgetQueueJobCountstostatsManager.ts)src/client/worker/worker.ts(843 lines) → 596 lines +workerRateLimiter.ts,workerHeartbeat.ts,workerPull.ts
- All barrel re-exports preserve backward compatibility — zero breaking changes
- 12 new files created, 6 files modified
[2.4.0] - 2026-02-11
Section titled “[2.4.0] - 2026-02-11”- Auto-batching for
queue.add()over TCP - Transparently batches concurrentadd()calls intoPUSHBcommands- Zero overhead for sequential
awaitusage (flush immediately when idle) - ~3x speedup for concurrent adds (buffers during in-flight flush)
- Configurable:
autoBatch: { maxSize: 50, maxDelayMs: 5 }(defaults) - Durable jobs bypass the batcher (sent as individual PUSH)
- Disable with
autoBatch: { enabled: false }
- Zero overhead for sequential
- 306 new tests covering previously untested modules
[2.3.1] - 2026-02-08
Section titled “[2.3.1] - 2026-02-08”- Non-numeric job IDs - Allow non-numeric job IDs in HTTP routes
- Updated HTTP route tests to match non-numeric job ID support
[2.3.0] - 2026-02-06
Section titled “[2.3.0] - 2026-02-06”- Latency Histograms - Prometheus-compatible histograms for push, pull, and ack operations
- Fixed bucket boundaries: 0.1ms to 10,000ms (15 buckets)
- Full exposition format:
_bucket{le="..."},_sum,_count - Percentile calculation (p50, p95, p99) for SLO tracking
- New files:
src/shared/histogram.ts,src/application/latencyTracker.ts
- Per-Queue Metric Labels - Prometheus labels for per-queue drill-down
bunqueue_queue_jobs_waiting{queue="..."}(waiting, delayed, active, dlq)- Enables Grafana filtering and alerting per queue name
- Throughput Tracker - Real-time EMA-based rate tracking
pushPerSec,pullPerSec,completePerSec,failPerSec- O(1) per observation, zero GC pressure
- Replaces placeholder zeros in
/statsendpoint - New file:
src/application/throughputTracker.ts
- LOG_LEVEL Runtime Filtering -
LOG_LEVELenv var now works at runtime- Levels:
debug,info(default),warn,error - Priority-based filtering with early return
- Levels:
- 39 new telemetry tests across 5 test files:
test/histogram.test.ts(9 tests)test/latencyTracker.test.ts(7 tests)test/perQueueMetrics.test.ts(7 tests)test/throughputTracker.test.ts(7 tests)test/telemetry-e2e.test.ts(9 E2E integration tests)
Changed
Section titled “Changed”/statsendpoint now returns real throughput and latency values- Monitoring docs updated with per-queue metrics, histogram examples, and logging section
- HTTP API docs updated with new Prometheus output format
Performance
Section titled “Performance”- Telemetry overhead: ~0.003% (~25ns per operation via
Bun.nanoseconds()) - Benchmark results unchanged: 197K push/s (embedded), 39K push/s (TCP)
[2.1.8] - 2026-02-06
Section titled “[2.1.8] - 2026-02-06”- pushJobBatch event emission -
pushJobBatchwas silently dropping event broadcasts, causing subscribers and webhooks to miss all batch-pushed jobs. Added broadcast loop after batch insert to match singlepushJobbehavior.
- 4 regression tests for batch push event emission fix
Changed
Section titled “Changed”- Navbar simplified to show only logo without title text
[2.1.7] - 2026-02-05
Section titled “[2.1.7] - 2026-02-05”- WriteBuffer silent data loss during shutdown -
WriteBuffer.stop()swallowed flush errors and silently dropped buffered jobs. AddedreportLostJobs()to notify viaonCriticalErrorcallback when jobs cannot be persisted during shutdown. - Queue name consistency in TCP tests - Fixed port hardcoding in queue-name-consistency test.
- 2,664 new tests across 37 files - Comprehensive test coverage increase from 1,083 to 3,747 tests (+246%) with zero failures. Coverage spans core operations, data structures, managers, client TCP layer, server handlers, domain types, MCP handlers, and more.
[2.1.6] - 2026-02-05
Section titled “[2.1.6] - 2026-02-05”- S3 backup hardening - 10 bug fixes with 33 new tests:
- Replace silent catch in cleanup with proper logging
- Reject retention < 1 and intervalMs < 60s in config validation
- Validate SQLite magic bytes before restore to prevent data corruption
- Guard cleanup against retention=0 deleting all backups
- Add S3 list pagination to handle >100 backups
- Run WAL checkpoint before backup to include uncheckpointed data
- Replace blocking gzipSync/gunzipSync with async CompressionStream
- Flaky sandboxedWorker concurrent test - Poll all 4 job results in parallel instead of sequentially to avoid exceeding the 5s test timeout.
- 33 new S3 backup tests covering config validation, backup/restore operations, cleanup, and manager lifecycle
- Documentation for gzip compression, SHA256 checksums,
.meta.jsonfiles, scheduling details, AWS env var aliases, and restore safety notes
[2.1.5] - 2026-02-05
Section titled “[2.1.5] - 2026-02-05”- uncaughtException and unhandledRejection handlers - Previously, any uncaught error in background tasks or unhandled promise rejections would crash the server immediately without cleanup (write buffer not flushed, SQLite not closed, locks not released). Now the server performs graceful shutdown: logs the error with stack trace, stops TCP/HTTP servers, waits for active jobs, flushes the write buffer, and exits cleanly.
- Broken GitHub links in documentation (missing
/bunqueuein paths) - Stray separator in index.mdx causing build error
Changed
Section titled “Changed”- Migrated documentation from GitHub Pages to Vercel deployment
- SEO optimization across all 45 pages with improved titles and descriptions
- Documentation errors fixed, missing content added, and navbar modernized
[2.1.4] - 2026-02-05
Section titled “[2.1.4] - 2026-02-05”Changed
Section titled “Changed”- README split into Embedded and Server mode sections
- Added Docker server mode quick start with persistence documentation
[2.1.3] - 2026-02-05
Section titled “[2.1.3] - 2026-02-05”- Type safety improvements across client SDK
- Deployment modes section and fixed quick start examples in documentation
Changed
Section titled “Changed”- README improved with use cases, benchmarks, and BullMQ comparison
[2.1.2] - 2026-02-04
Section titled “[2.1.2] - 2026-02-04”- Queue name consistency - Fixed benchmark tests using different queue names for worker and queue in both embedded and TCP modes
Changed
Section titled “Changed”- Stats interval changed to 5 minutes with timestamp
- Removed verbose info/warn logs, keeping only errors
- Downgraded TypeScript to 5.7.3 for CI compatibility
- Queue name consistency tests to prevent regression
- Monitoring documentation added to sidebar Production section
[2.1.1] - 2026-02-04
Section titled “[2.1.1] - 2026-02-04”- Prometheus + Grafana Monitoring Stack - Complete observability setup:
- Docker Compose profile for one-command monitoring deployment
- Pre-configured Prometheus scraping with 5s interval
- Comprehensive Grafana dashboard with 6 panel rows:
- Overview: Waiting, Delayed, Active, Completed, DLQ, Workers, Cron, Uptime
- Throughput: Jobs/sec graphs, queue depth over time
- Success/Failure: Rate gauges, completed vs failed charts
- Workers: Count, throughput, utilization gauge
- Webhooks & Cron: Status and lifetime totals
- Alerts: Visual indicators for DLQ, failure rate, backlog, workers
- 8 pre-configured Prometheus alert rules:
BunqueueDLQHigh- DLQ > 100 for 5m (critical)BunqueueHighFailureRate- Failure > 5% for 5m (warning)BunqueueQueueBacklog- Waiting > 10k for 10m (warning)BunqueueNoWorkers- No workers with waiting jobs (critical)BunqueueServerDown- Server unreachable (critical)BunqueueLowThroughput- < 1 job/s for 10m (warning)BunqueueWorkerOverload- Utilization > 95% (warning)BunqueueJobsStuck- Active jobs, no completions (warning)
- Monitoring Documentation - New guide at
/guide/monitoring/
Changed
Section titled “Changed”- Docker Compose now supports
--profile monitoringfor optional stack
[2.1.0] - 2026-02-04
Section titled “[2.1.0] - 2026-02-04”Performance
Section titled “Performance”- TCP Pipelining - Major throughput improvement for TCP client operations:
- Client-side: Multiple commands in flight per connection (up to 100 by default)
- Server-side: Parallel command processing with
Promise.all() - reqId-based response matching for correct command-response pairing
- 125,000 ops/sec in pipelining benchmarks (vs ~20,000 before)
- Configurable via
pipelining: booleanandmaxInFlight: numberoptions
- SQLite indexes for high-throughput operations - Added 4 new indexes for 30-50% faster queries:
idx_jobs_state_started: Stall detection now O(log n) instead of O(n) table scanidx_jobs_group_id: Fast lookup for group operationsidx_jobs_pending_priority: Compound index for priority-ordered job retrievalidx_dlq_entered_at: DLQ expiration cleanup now O(log n)
- Date.now() caching in pull loop - Reduced syscalls by caching timestamp per iteration (+3-5% throughput)
- Hello command for protocol version negotiation (
cmd: 'Hello') - Protocol version 2 with pipelining capability support
- Semaphore utility for server-side concurrency limiting (
src/shared/semaphore.ts) - Comprehensive pipelining test suites:
test/protocol-reqid.test.ts- 7 tests for reqId handlingtest/client-pipelining.test.ts- 7 tests for client pipeliningtest/server-pipelining.test.ts- 7 tests for server parallel processingtest/backward-compat.test.ts- 10 tests for backward compatibility
- Fair benchmark comparison (
bench/comparison/run.ts):- Both bunqueue and BullMQ use identical parallel push strategy
- Queue cleanup with
obliterate()between tests - Results: 1.3x Push, 3.2x Bulk Push, 1.7x Process vs BullMQ
- Comprehensive benchmark (
bench/comprehensive.ts):- Embedded vs TCP mode comparison at scales [1K, 5K, 10K, 50K]
- Log suppression for clean output
- Peak results: 287K ops/sec (Embedded Bulk), 149K ops/sec (TCP Bulk)
- Embedded mode is 2-4x faster than TCP across all operations
- New ConnectionOptions - Added
pingInterval,commandTimeout,pipelining,maxInFlightto public API
- SQLITE_BUSY under high concurrency - Added
PRAGMA busy_timeout = 5000to wait for locks instead of failing immediately - “Database has closed” errors during shutdown - Added
stoppedflag to WriteBuffer to prevent flush attempts after stop() - Critical: Worker pendingJobs race condition - Concurrent
tryProcess()calls could overwrite each other’s job buffers, causing ~30% job loss under high concurrency. Now preserves existing buffered jobs when pulling new batches. - Connection options not passed through - Worker, Queue, and FlowProducer now correctly pass
pingInterval,commandTimeout,pipelining, andmaxInFlightoptions to the TCP connection pool.
Changed
Section titled “Changed”- Schema version bumped to 5 (auto-migrates existing databases)
- TCP client now includes
reqIdin all commands for response matching - Server processes multiple frames in parallel (max 50 concurrent per connection)
- Documentation: Rewrote comparison page with real benchmark data and methodology explanation
[2.0.9] - 2026-02-03
Section titled “[2.0.9] - 2026-02-03”- Critical: Memory leak in EventsManager - Cancelled waiters in
waitForJobCompletion()were never removed from thecompletionWaitersmap on timeout. Now properly cleaned up when timeout fires. - Critical: Lost notification TOCTOU race - Fixed race condition in pull.ts where
notify()could fire betweentryPullFromShard()returning null andwaitForJob()being called. AddedpendingNotificationflag to Shard to capture notifications when no waiters exist. - Critical: WriteBuffer data loss - Added exponential backoff (100ms → 30s), max 10 retries, critical error callback,
stopGracefully()method, and enhanced error callback with retry information. Previously, persistent errors caused infinite retries and shutdown lost pending jobs. - Critical: CustomIdMap race condition - Concurrent pushes with same customId could create duplicates. Moved customIdMap check inside shard write lock for atomic check-and-insert.
- Comprehensive test suites for all bug fixes:
test/bug-memory-leak-waiters.test.ts- 5 tests verifying memory leak fixtest/bug-lost-notification.test.ts- 4 tests verifying notification fixtest/bug-writebuffer-dataloss.test.ts- 10 tests verifying WriteBuffer fixtest/bug-verification-remaining.test.ts- 7 tests verifying CustomId fix and JS concurrency model
[2.0.3] - 2026-02-02
Section titled “[2.0.3] - 2026-02-02”Changed
Section titled “Changed”- Major refactor: Split queue.ts into modular architecture (1955 → 485 lines)
- Follows single responsibility principle with 14 focused modules
- New modules: operations/add.ts, operations/counts.ts, operations/query.ts, operations/management.ts, operations/cleanup.ts, operations/control.ts
- New modules: jobMove.ts, jobProxy.ts, bullmqCompat.ts, scheduler.ts, dlq.ts, stall.ts, rateLimit.ts, deduplication.ts, workers.ts, queueTypes.ts
- All 894 unit tests, 25 TCP test suites, and 32 embedded test suites pass
getJob()now properly awaits async manager.getJob() callgetJobCounts()now uses queue-specific counts instead of global statspromoteJobs()implements correct iteration over delayed jobsaddBulk()properly passes BullMQ v5 options (lifo, stackTraceLimit, keepLogs, etc.)toPublicJob()used for full job options support in getJob()extendJobLock()passes token parameter correctly
[2.0.2] - 2026-02-02
Section titled “[2.0.2] - 2026-02-02”- Critical: Complete recovery logic for deduplication after restart - Fixed all recovery scenarios that caused duplicate jobs after server restart:
- jobId deduplication (
customIdMap) - Now properly populated on recovery - uniqueKey TTL deduplication - Now restored with TTL settings via
registerUniqueKeyWithTtl() - Dependency recovery - Now checks SQLite
job_resultstable (not just in-memorycompletedJobs) - Counter consistency - Fixed
incrementQueued()only called for main queue jobs, notwaitingDeps
- jobId deduplication (
loadCompletedJobIds()method in SQLite storage for dependency recoveryhasResult()method to check if job result exists in SQLite- Comprehensive recovery test suite (
test/recoveryLogic.test.ts) with 8 tests covering all scenarios
[2.0.1] - 2026-02-02
Section titled “[2.0.1] - 2026-02-02”- Critical: jobId deduplication not working after restart - The
customIdMapwas not populated when recovering jobs from SQLite on server startup. This causedgetDeduplicationJobId()to returnnulland allowed duplicate jobs with the samejobIdto be created.
[2.0.0] - 2026-02-02
Section titled “[2.0.0] - 2026-02-02”- Complete BullMQ v5 API Compatibility - Full feature parity with BullMQ v5
- Worker Advanced Methods
rateLimit(expireTimeMs)- Apply rate limiting to workerisRateLimited()- Check if worker is currently rate limitedstartStalledCheckTimer()- Start stalled job check timerdelay(ms, abortController?)- Delay worker processing with optional abort
- Job Advanced Methods
discard()- Mark job as discardedgetFailedChildrenValues()- Get failed children job valuesgetIgnoredChildrenFailures()- Get ignored children failuresremoveChildDependency()- Remove child dependency from parentremoveDeduplicationKey()- Remove deduplication keyremoveUnprocessedChildren()- Remove unprocessed children jobs
- JobOptions
continueParentOnFailure- Continue parent job when child failsignoreDependencyOnFailure- Ignore dependency on failuretimestamp- Custom job timestamp
- DeduplicationOptions
extend- Extend TTL on duplicatereplace- Replace existing job on duplicate
- Worker Advanced Methods
- Comprehensive Test Coverage - 27 unit tests + 32 embedded script tests for new features
Changed
Section titled “Changed”- Major version bump to 2.0.0 reflecting complete BullMQ v5 compatibility
- Updated TypeScript types for all new features
[1.9.9] - 2026-02-01
Section titled “[1.9.9] - 2026-02-01”- Comprehensive Functional Test Suite - 28 new test files covering all major features
- 14 embedded mode tests + 14 TCP mode tests
- Tests for: advanced DLQ, job management, monitoring, rate limiting, stall detection, webhooks, queue groups, and more
- All 24 embedded test suites pass (143/143 individual tests)
Changed
Section titled “Changed”- BullMQ-Style Idempotency -
jobIdoption now returns existing job instead of throwing error- Duplicate job submissions are idempotent (same behavior as BullMQ)
- Cleaner handling of retry scenarios without error handling
- Improved documentation for
jobIddeduplication behavior
- Embedded test suite now properly uses embedded mode (was incorrectly trying TCP)
- Fixed
getJobCounts()in tests to use queue-specificgetJobs()method - Fixed async
getJob()calls in job management tests - Fixed PROMOTE, CHANGE PRIORITY, and MOVE TO DELAYED test logic
[1.9.8] - 2026-01-31
Section titled “[1.9.8] - 2026-01-31”Changed
Section titled “Changed”- msgpackr Binary Protocol - Switched TCP protocol from JSON to msgpackr binary
- ~30% faster serialization/deserialization
- Smaller message sizes
[1.9.6] - 2026-01-31
Section titled “[1.9.6] - 2026-01-31”- Durable Writes - New
durable: trueoption for critical jobs- Bypasses write buffer for immediate disk persistence
- Guarantees no data loss on process crash
- Use for payments, orders, and critical events
Changed
Section titled “Changed”- Reduced write buffer flush interval from 50ms to 10ms
- Smaller data loss window for non-durable jobs
- Better balance between throughput and safety
[1.9.4] - 2026-01-31
Section titled “[1.9.4] - 2026-01-31”- 5 BullMQ-Compatible Features
- Timezone support for cron jobs - IANA timezones (e.g., “Europe/Rome”, “America/New_York”)
getCountsPerPriority()- Get job counts grouped by priority levelgetJobs()with pagination - Filter by state, paginate withstart/end, sort withascretryCompleted()- Re-queue completed jobs for reprocessing- Advanced deduplication - TTL-based unique keys with
extendandreplacestrategies
Changed
Section titled “Changed”- Documentation improvements
- Clear comparison table for Embedded vs TCP Server modes
- Danger box warning about mixed modes causing “Command timeout” error
- Added “Connecting from Client” section to Server guide
[1.9.3] - 2026-01-31
Section titled “[1.9.3] - 2026-01-31”- Unix Socket Support - TCP and HTTP servers can now bind to Unix sockets
- Configure via
TCP_SOCKET_PATHandHTTP_SOCKET_PATHenvironment variables - CLI flags
--tcp-socketand--http-socket - Lower latency for local connections
- Configure via
- Socket status line in startup banner
- Test alignment for shard drain return type
[1.9.2] - 2026-01-30
Section titled “[1.9.2] - 2026-01-30”- Critical Memory Leak - Resolved
temporalIndexleak causing 5.5M object retention after 1M jobs- Added
cleanOrphanedTemporalEntries()method to Shard - Memory now properly released after job completion with
removeOnComplete: true heapUseddrops to ~6MB after processing (vs 264MB before fix)
- Added
Changed
Section titled “Changed”- Improved error logging in ackBatcher flush operations
[1.9.1] - 2026-01-29
Section titled “[1.9.1] - 2026-01-29”- Two-Phase Stall Detection - BullMQ-style stall detection to prevent false positives
- Jobs marked as candidates on first check, confirmed stalled on second
- Prevents requeuing jobs that complete between checks
stallTimeoutsupport in client push options- Advanced health checks for TCP connections
- Defensive checks and cleanup for TCP pool and worker
- Server banner alignment between CLI and main.ts
Changed
Section titled “Changed”- Modularized client code into separate TCP, Worker, Queue, and Sandboxed modules
[1.9.0] - 2026-01-28
Section titled “[1.9.0] - 2026-01-28”- TCP Client - High-performance TCP client for remote server connections
- Connection pooling with configurable pool size
- Heartbeat keepalive mechanism
- Batch pull/ACK operations (PULLB, ACKB with results)
- Long polling support
- Ping/pong health checks
- 4.7x faster push throughput with optimized TCP client
Changed
Section titled “Changed”- Connection pool enabled by default for TCP clients
- Improved ESLint compliance across TCP client code
[1.6.8] - 2026-01-27
Section titled “[1.6.8] - 2026-01-27”- Renamed bunq to bunqueue in Dockerfile
- CLI version now read dynamically from package.json
Changed
Section titled “Changed”- Centralized version in
shared/version.ts
[1.6.7] - 2026-01-26
Section titled “[1.6.7] - 2026-01-26”- Dynamic version badge in documentation
- Mobile-responsive layout improvements
- Comprehensive stress tests
[1.6.6] - 2026-01-25
Section titled “[1.6.6] - 2026-01-25”- Counter updates when recovering jobs from SQLite on restart
[1.6.5] - 2026-01-24
Section titled “[1.6.5] - 2026-01-24”- Production readiness improvements with critical fixes
[1.6.4] - 2026-01-23
Section titled “[1.6.4] - 2026-01-23”- SQLite persistence for DLQ entries
- Client SDK persistence issues
[1.6.3] - 2026-01-22
Section titled “[1.6.3] - 2026-01-22”- MCP Server - Model Context Protocol server for AI assistant integration
- Queue management tools for Claude, Cursor, and other AI assistants
- BigInt serialization handling in stats
- Deployment guide documentation corrections
[1.6.2] - 2026-01-21
Section titled “[1.6.2] - 2026-01-21”- SandboxedWorker - Isolated worker processes for crash protection
- Hono and Elysia integration guides
- Section-specific OG images and sitemap
Changed
Section titled “Changed”- Enhanced SEO with Open Graph and Twitter meta tags
- Improved mobile responsiveness in documentation
[1.6.1] - 2026-01-20
Section titled “[1.6.1] - 2026-01-20”- Bunny ASCII art in server startup and CLI help
- Professional benchmark charts using QuickChart.io
- BullMQ vs bunqueue comparison benchmarks
Changed
Section titled “Changed”- Optimized event subscriptions and batch operations
- Replaced Math.random UUID with Bun.randomUUIDv7 (10x faster)
- High-impact algorithm optimizations
[1.6.0] - 2026-01-19
Section titled “[1.6.0] - 2026-01-19”- Stall Detection - Automatic recovery of unresponsive jobs
- Configurable stall interval and max stalls
- Grace period after job start
- Automatic retry or move to DLQ
- Advanced DLQ - Enhanced Dead Letter Queue
- Full metadata (reason, error, attempt history)
- Auto-retry with exponential backoff
- Filtering by reason, age, retriability
- Statistics endpoint
- Auto-purge expired entries
- Worker Heartbeats - Configurable heartbeat interval
- Repeatable Jobs - Support for recurring jobs with intervals or limits
- Flow Producer - Parent-child job relationships
- Queue Groups - Bulk operations across multiple queues
Changed
Section titled “Changed”- Updated banner to “written in TypeScript”
- Version now read from package.json dynamically
- DLQ entry return type consistency
[1.5.0] - 2026-01-15
Section titled “[1.5.0] - 2026-01-15”- S3 backup with configurable retention
- Support for Cloudflare R2, MinIO, DigitalOcean Spaces
- Backup CLI commands (now, list, restore, status)
Changed
Section titled “Changed”- Improved backup compression
- Better error messages for S3 configuration
[1.4.0] - 2026-01-10
Section titled “[1.4.0] - 2026-01-10”- Rate limiting per queue
- Concurrency limiting per queue
- Prometheus metrics endpoint
- Health check endpoint
Changed
Section titled “Changed”- Optimized batch operations (3x faster)
- Reduced memory usage for large queues
[1.3.0] - 2026-01-05
Section titled “[1.3.0] - 2026-01-05”- Cron job scheduling
- Webhook notifications
- Job progress tracking
- Job logs
- Memory leak in event listeners
- Race condition in batch acknowledgment
[1.2.0] - 2025-12-28
Section titled “[1.2.0] - 2025-12-28”- Priority queues
- Delayed jobs
- Retry with exponential backoff
- Job timeout
Changed
Section titled “Changed”- Improved SQLite schema with indexes
- Better error handling
[1.1.0] - 2025-12-20
Section titled “[1.1.0] - 2025-12-20”- TCP protocol for high-performance clients
- HTTP API with WebSocket support
- Authentication tokens
- CORS configuration
[1.0.0] - 2025-12-15
Section titled “[1.0.0] - 2025-12-15”- Initial release
- Queue and Worker classes
- SQLite persistence with WAL mode
- Basic DLQ support
- CLI for server and client operations