Skip to content

Deep Dive: How Sharding Distributes Jobs Across CPU Cores

One of bunqueue’s key design decisions is sharding jobs across multiple priority queues. Instead of a single global queue with a lock, bunqueue creates multiple independent shards that can be accessed concurrently. Here’s how it works.

Why Shard?

A single priority queue becomes a bottleneck when multiple workers pull jobs simultaneously. Lock contention grows with concurrency. Sharding solves this by partitioning the job space:

  • Each shard has its own priority queue
  • Workers can pull from different shards concurrently
  • Lock contention is reduced by a factor of N (shard count)

Shard Count: Auto-Detected from CPU Cores

The number of shards is calculated at startup based on available CPU cores, rounded up to the nearest power of 2:

function calculateShardCount(): number {
const cpuCount = navigator.hardwareConcurrency || 4;
// Round up to next power of 2, cap at 64
const pow2 = Math.min(
64,
Math.max(4, 1 << Math.ceil(Math.log2(cpuCount)))
);
return pow2;
}
CPU CoresShard Count
1-44
5-88
9-1616
17-3232
33+64

The power-of-2 constraint is critical for the hashing strategy.

FNV-1a: Fast Deterministic Hashing

Jobs are assigned to shards using FNV-1a hash on the queue name:

function fnv1aHash(str: string): number {
let hash = 0x811c9dc5; // FNV offset basis
for (let i = 0; i < str.length; i++) {
hash ^= str.charCodeAt(i);
hash = (hash * 0x01000193) | 0; // FNV prime
}
return hash >>> 0; // Ensure unsigned
}
const SHARD_MASK = SHARD_COUNT - 1;
function shardIndex(queueName: string): number {
return fnv1aHash(queueName) & SHARD_MASK;
}

Because SHARD_COUNT is always a power of 2, we use bitwise AND instead of modulo. This is a single CPU instruction vs. an expensive division:

// Fast: single AND instruction
const idx = hash & SHARD_MASK;
// Slow: division + remainder
const idx = hash % SHARD_COUNT;

The Shard Structure

Each shard manages multiple named queues, each backed by an IndexedPriorityQueue:

class Shard {
private readonly queues = new Map<string, IndexedPriorityQueue>();
private readonly dlqEntries = new Map<string, DlqEntry[]>();
private readonly stallConfigs = new Map<string, StallConfig>();
getQueue(name: string): IndexedPriorityQueue {
let queue = this.queues.get(name);
if (!queue) {
queue = new IndexedPriorityQueue();
this.queues.set(name, queue);
}
return queue;
}
push(job: Job): void {
this.getQueue(job.queue).push(job);
}
pop(queueName: string): Job | null {
return this.getQueue(queueName).pop();
}
}

Processing Shards: Separate Lock Domain

Active jobs (being processed by workers) are tracked in a separate set of processing shards with their own locking:

// Processing shards use job ID hashing (not queue name)
function processingShardIndex(jobId: JobId): number {
return fnv1aHash(String(jobId)) & PROCESSING_SHARD_MASK;
}

This separation is intentional. The lock hierarchy is strictly enforced:

  1. jobIndex (read) -> 2. completedJobs (read) -> 3. shards[N] (write) -> 4. processingShards[N] (write)

Violating this order would create deadlocks.

Performance Impact

Sharding delivers measurable throughput improvements on multi-core systems:

ScenarioSingle Queue4 Shards16 Shards
1 workerbaseline~same~same
4 workerscontention~3.5x~3.8x
16 workershigh contention~8x~14x

The gains come from reduced lock contention. With a single queue, all workers compete for one lock. With N shards, workers on different queues never contend.

Key Design Decisions

Queue-name sharding, not job-ID sharding. All jobs for a queue live in one shard. This means a single queue can’t benefit from sharding, but it dramatically simplifies the pull operation (no need to check multiple shards).

Separate processing shards. Active jobs are tracked separately from queued jobs. This prevents pull operations from blocking ack/fail operations.

Static shard count. Shards are created at startup and never change. This avoids the complexity of resharding and makes the hash-to-shard mapping immutable.