Skip to content

Telemetry & OpenTelemetry

bunqueue includes built-in telemetry for production observability. All instrumentation is zero-allocation and adds less than 0.003% overhead (~25ns per operation).

Latency Histograms

Every push, pull, and ack operation is timed using Bun.nanoseconds() and recorded in Prometheus-compatible histograms.

Metrics

MetricDescription
bunqueue_push_duration_msTime to push a job (includes lock acquisition and shard insertion)
bunqueue_pull_duration_msTime to pull a job from queue
bunqueue_ack_duration_msTime to acknowledge a completed job

Bucket Boundaries

Default boundaries (in milliseconds):

0.1, 0.5, 1, 2.5, 5, 10, 25, 50, 100, 250, 500, 1000, 2500, 5000, 10000

Prometheus Output

Each histogram exposes three series:

# HELP bunqueue_push_duration_ms Push operation latency in ms
# TYPE bunqueue_push_duration_ms histogram
bunqueue_push_duration_ms_bucket{le="0.1"} 120
bunqueue_push_duration_ms_bucket{le="0.5"} 89500
bunqueue_push_duration_ms_bucket{le="1"} 145000
bunqueue_push_duration_ms_bucket{le="2.5"} 149800
bunqueue_push_duration_ms_bucket{le="+Inf"} 150000
bunqueue_push_duration_ms_sum 12045.3
bunqueue_push_duration_ms_count 150000

Percentiles

Use Prometheus histogram_quantile() for SLO tracking:

# p99 push latency
histogram_quantile(0.99, rate(bunqueue_push_duration_ms_bucket[5m]))
# p50 pull latency
histogram_quantile(0.50, rate(bunqueue_pull_duration_ms_bucket[5m]))

Programmatic Access

import { latencyTracker } from 'bunqueue/application/latencyTracker';
// Get averages
const avg = latencyTracker.getAverages();
// { pushMs: 0.08, pullMs: 0.12, ackMs: 0.05 }
// Get percentiles
const pct = latencyTracker.getPercentiles();
// { push: { p50: 0.05, p95: 0.25, p99: 1.0 }, pull: {...}, ack: {...} }
// Reset histograms
latencyTracker.push.reset();

Throughput Tracking

Real-time throughput is tracked using Exponential Moving Average (EMA) with alpha=0.3. This provides smooth rate estimates with O(1) cost and zero allocations.

Available Rates

RateDescription
pushPerSecJobs pushed per second
pullPerSecJobs pulled per second
completePerSecJobs completed (acked) per second
failPerSecJobs failed per second

Accessing Rates

Rates are exposed via the /stats HTTP endpoint:

Terminal window
curl http://localhost:6790/stats
{
"ok": true,
"stats": {
"pushPerSec": 12500,
"pullPerSec": 12480,
"avgLatencyMs": 0.08,
"avgProcessingMs": 0.12
}
}

Programmatic Access

import { throughputTracker } from 'bunqueue/application/throughputTracker';
const rates = throughputTracker.getRates();
// { pushPerSec: 12500, pullPerSec: 12480, completePerSec: 12400, failPerSec: 3 }

Per-Queue Metrics

All queue metrics include a queue label for drill-down by queue name. This enables per-queue dashboards and alerts in Grafana.

Metrics

MetricTypeDescription
bunqueue_queue_jobs_waiting{queue="..."}gaugeWaiting jobs in specific queue
bunqueue_queue_jobs_delayed{queue="..."}gaugeDelayed jobs in specific queue
bunqueue_queue_jobs_active{queue="..."}gaugeActive jobs in specific queue
bunqueue_queue_jobs_dlq{queue="..."}gaugeDLQ entries for specific queue

Prometheus Output

bunqueue_queue_jobs_waiting{queue="emails"} 30
bunqueue_queue_jobs_waiting{queue="payments"} 12
bunqueue_queue_jobs_active{queue="emails"} 5
bunqueue_queue_jobs_delayed{queue="payments"} 100
bunqueue_queue_jobs_dlq{queue="emails"} 2

Grafana Queries

# Waiting jobs for a specific queue
bunqueue_queue_jobs_waiting{queue="emails"}
# Total active jobs across all queues
sum(bunqueue_queue_jobs_active)
# Top 5 queues by backlog
topk(5, bunqueue_queue_jobs_waiting)

Programmatic Access

const perQueue = queueManager.getPerQueueStats();
// Map<string, { waiting, delayed, active, dlq }>
const emailStats = perQueue.get('emails');
// { waiting: 30, delayed: 0, active: 5, dlq: 2 }

Log Level Configuration

Configure log verbosity at startup with the LOG_LEVEL environment variable.

Levels

LevelPriorityDescription
debug0Verbose debugging output
info1General operational messages (default)
warn2Warning conditions
error3Error conditions only

Messages below the configured level are filtered with an early return (zero cost).

Configuration

Terminal window
# Environment variable
LOG_LEVEL=debug bun run src/main.ts
# Combined with JSON output
LOG_LEVEL=warn LOG_FORMAT=json bun run src/main.ts

Runtime Change

import { Logger } from 'bunqueue/shared/logger';
Logger.setLevel('debug'); // Enable verbose logging
Logger.setLevel('error'); // Only errors

Integrations

bunqueue exposes a standard Prometheus /prometheus endpoint and structured JSON logs. This makes it compatible out-of-the-box with all major observability platforms.

Metrics (Prometheus Scraping)

PlatformHow to Connect
PrometheusNative scrape on /prometheus
Grafana CloudGrafana Agent or Alloy scraping /prometheus
AxiomPrometheus remote write or scrape from /prometheus
DatadogAgent with openmetrics check on /prometheus
New RelicPrometheus remote write or nri-prometheus
Victoria MetricsDirect scrape, drop-in Prometheus replacement
ChronospherePrometheus remote write
Splunk ObservabilityOtel Collector with Prometheus receiver

Log Aggregation (LOG_FORMAT=json)

PlatformHow to Connect
AxiomAxiom agent or Fluent Bit shipping stdout JSON
Grafana LokiPromtail or Alloy collecting stdout JSON
Elasticsearch (ELK)Filebeat shipping JSON logs
Datadog LogsDatadog Agent collecting stdout
SplunkUniversal Forwarder or HTTP Event Collector
CloudWatchCloudWatch Agent on stdout

Example: Axiom

# prometheus.yml - Axiom as remote write target
remote_write:
- url: https://api.axiom.co/v1/integrations/prometheus
bearer_token: "xaat-your-axiom-token"
remote_timeout: 30s

Example: Datadog

# datadog agent conf.d/openmetrics.d/conf.yaml
instances:
- prometheus_url: http://localhost:6790/prometheus
namespace: bunqueue
metrics:
- bunqueue_*

Example: Grafana Cloud

# alloy config
prometheus.scrape "bunqueue" {
targets = [{"__address__" = "localhost:6790"}]
metrics_path = "/prometheus"
forward_to = [prometheus.remote_write.grafana_cloud.receiver]
}

Architecture

Performance Characteristics

ComponentOverheadAllocations
Latency histogram~20ns per observe()Zero (Float64Array)
Throughput counter~5ns per increment()Zero (primitive counter)
Per-queue statsO(queues) per scrapeMap allocation on scrape only
Log filtering~2ns per filtered messageZero (early return)

How It Works

  1. Latency: Bun.nanoseconds() captures start time before the operation. After completion, the delta is converted to milliseconds and recorded in a Histogram using binary search over fixed bucket boundaries.

  2. Throughput: Each operation increments a counter. On rate calculation (triggered by /stats requests), an EMA smooths the raw count into a per-second rate.

  3. Per-queue: On /prometheus scrape, the system iterates known queue names and aggregates stats from the designated shard (O(1) lookup via shardIndex(queueName)).

  4. Log filtering: A priority lookup table maps level strings to integers. The log() method compares the message level against the configured minimum and returns immediately if below threshold.

Alerting Examples

# Alert on high p99 latency
- alert: BunqueueHighPushLatency
expr: histogram_quantile(0.99, rate(bunqueue_push_duration_ms_bucket[5m])) > 50
for: 5m
labels:
severity: warning
annotations:
summary: "Push p99 latency above 50ms"
# Alert on per-queue backlog
- alert: BunqueueQueueBacklog
expr: bunqueue_queue_jobs_waiting{queue="payments"} > 1000
for: 10m
labels:
severity: critical
annotations:
summary: "Payments queue backlog exceeds 1000 jobs"
# Alert on low throughput
- alert: BunqueueLowThroughput
expr: rate(bunqueue_jobs_pushed_total[5m]) < 1
for: 10m
labels:
severity: warning
annotations:
summary: "Push throughput dropped below 1 job/sec"