Skip to content

bunqueue Architecture: Sharded Job Queue System Design for Bun

bunqueue is a high-performance job queue built for Bun with SQLite persistence. This section covers the internal architecture, data flows, and design decisions.

┌─────────────────────────────────────┐
│ CLIENT LAYER │
│ Queue.add() ──────► Worker.process()│
│ │ ▲ │
│ ▼ │ │
│ TcpPool ◄── msgpack ──► TcpPool │
└───────┬────────────────────┬─────────┘
│ TCP :6789 │
┌───────▼────────────────────▼─────────┐
│ SERVER LAYER │
│ │
│ ┌────────────────────────────────┐ │
│ │ QueueManager │ │
│ │ │ │
│ │ ┌──────────────────────────┐ │ │
│ │ │ N Shards (auto-detect) │ │ │
│ │ │ ┌──────┬──────┬──────┐ │ │ │
│ │ │ │Shard0│Shard1│ ...N │ │ │ │
│ │ │ └──────┴──────┴──────┘ │ │ │
│ │ └──────────────────────────┘ │ │
│ │ │ │
│ │ jobIndex │ completedJobs │ │
│ │ customIdMap │ jobResults │ │
│ └────────────────────────────────┘ │
│ │ │
│ ┌───────────────▼───────────────┐ │
│ │ PERSISTENCE LAYER │ │
│ │ WriteBuffer ──► SQLite (WAL) │ │
│ └───────────────────────────────┘ │
│ │
│ ┌────────────────────────────────┐ │
│ │ BACKGROUND TASKS │ │
│ │ Scheduler │ Stall Detection │ │
│ │ DLQ Maint │ Cleanup │ │
│ └────────────────────────────────┘ │
└──────────────────────────────────────┘
LayerPurposeKey Components
ClientSDK for applicationsQueue, Worker, FlowProducer, TcpPool
ServerRequest handlingTcpServer, HttpServer, Handlers
ApplicationOrchestrationQueueManager, Operations, Managers
DomainBusiness logicShard, PriorityQueue, DLQ
InfrastructureExternal systemsSQLite, S3 Backup, Scheduler
SharedUtilitiesHash, Lock, LRU, MinHeap
SectionDescription
Client SDKTCP connection, job submission, worker processing
Domain LayerSharding, priority queues, DLQ logic
Application LayerOperations flow, background tasks
PersistenceSQLite configuration, write buffering, servers
Data StructuresCore algorithms and complexities
TCP ProtocolWire format and commands

Jobs are distributed across N independent shards (auto-detected from CPU cores) using FNV-1a hash:

SHARD_COUNT = calculateShardCount() // Power of 2, based on CPU cores, max 64
SHARD_MASK = SHARD_COUNT - 1
shardIndex = fnv1aHash(queueName) & SHARD_MASK
// Examples: 4 cores → 4 shards, 10 cores → 16 shards, 64+ cores → 64 shards

Benefits:

  • Auto-scales with hardware (power of 2, max 64)
  • Parallel operations on different queues
  • Reduced lock contention
  • Bitwise AND faster than modulo

Each shard contains a 4-ary heap instead of binary:

  • Better cache locality (children fit in cache line)
  • Fewer tree levels (8 vs 16 for 65k items)
  • O(log₄ n) operations

Jobs batch before SQLite write:

┌─────────┐ 10ms or ┌───────────────────┐
│ Buffer │ ──────────► │ Multi-row INSERT │
│ (100) │ 100 jobs │ ~100k jobs/sec │
└─────────┘ └───────────────────┘
  • Buffered: ~100k jobs/sec, up to 10ms loss risk
  • Durable: ~10k jobs/sec, immediate persistence

Heap entries use generation tracking:

Remove: Delete from index (O(1)), mark heap entry stale
Pop: Skip entries where generation != current
Compact: Rebuild when >20% stale

Acquire in order to prevent deadlocks:

1. jobIndex (read-only)
2. completedJobs (check before lock)
3. shardLocks[N]
4. processingLocks[N]
CollectionLimitEviction
completedJobs50,000FIFO batch
jobResults5,000LRU
jobLogs10,000LRU
customIdMap50,000LRU
DLQ per queue10,000FIFO
OperationComplexity
PUSHO(log₄ n)
PULLO(log₄ n)
ACKO(1)
ACK batchO(shards)
Job lookupO(1)
StatsO(1)