Monitoring
bunqueue exposes Prometheus-compatible metrics for production monitoring. This guide covers the built-in metrics endpoint and a ready-to-use Grafana dashboard.
Quick Start with Docker Compose
bunqueue includes a pre-configured monitoring stack:
# Start bunqueue + Prometheus + Grafanadocker compose --profile monitoring up -dAccess the dashboards:
- Grafana: http://localhost:3000 (admin/bunqueue)
- Prometheus: http://localhost:9090
Prometheus Endpoint
bunqueue exposes metrics at /prometheus on the HTTP port (default 6790):
curl http://localhost:6790/prometheusAvailable Metrics
| Metric | Type | Description |
|---|---|---|
bunqueue_jobs_waiting | gauge | Jobs waiting in queue |
bunqueue_jobs_delayed | gauge | Delayed jobs |
bunqueue_jobs_active | gauge | Jobs being processed |
bunqueue_jobs_completed | gauge | Completed jobs in memory |
bunqueue_jobs_dlq | gauge | Jobs in dead letter queue |
bunqueue_jobs_pushed_total | counter | Total jobs pushed |
bunqueue_jobs_pulled_total | counter | Total jobs pulled |
bunqueue_jobs_completed_total | counter | Total jobs completed |
bunqueue_jobs_failed_total | counter | Total jobs failed |
bunqueue_uptime_seconds | gauge | Server uptime |
bunqueue_cron_jobs_total | gauge | Registered cron jobs |
bunqueue_workers_total | gauge | Registered workers |
bunqueue_workers_active | gauge | Active workers |
bunqueue_workers_processed_total | counter | Jobs processed by workers |
bunqueue_workers_failed_total | counter | Jobs failed by workers |
bunqueue_webhooks_total | gauge | Total webhooks |
bunqueue_webhooks_enabled | gauge | Enabled webhooks |
Example Output
# HELP bunqueue_jobs_waiting Number of jobs waiting in queue# TYPE bunqueue_jobs_waiting gaugebunqueue_jobs_waiting 42
# HELP bunqueue_jobs_active Number of jobs being processed# TYPE bunqueue_jobs_active gaugebunqueue_jobs_active 8
# HELP bunqueue_jobs_pushed_total Total jobs pushed# TYPE bunqueue_jobs_pushed_total counterbunqueue_jobs_pushed_total 150432Prometheus Configuration
Add bunqueue to your prometheus.yml:
scrape_configs: - job_name: 'bunqueue' scrape_interval: 5s static_configs: - targets: ['localhost:6790'] metrics_path: /prometheusWith authentication:
scrape_configs: - job_name: 'bunqueue' scrape_interval: 5s static_configs: - targets: ['localhost:6790'] metrics_path: /prometheus bearer_token: 'your-auth-token'Grafana Dashboard
The included dashboard provides:
Overview Row
- Jobs Waiting, Delayed, Active, Completed, DLQ
- Active Workers, Cron Jobs, Uptime
Throughput & Performance
- Job throughput (pushed/pulled/completed/failed per second)
- Queue depth over time (stacked area chart)
Success & Failure Analysis
- Success rate gauge (with thresholds)
- Failure rate gauge (5-minute window)
- Completed vs Failed bar chart
Workers & Processing
- Worker count over time
- Worker throughput (processed/failed per second)
- Worker utilization gauge
Webhooks & Cron
- Webhook status
- Cron job count
- Lifetime totals
Alerts & Health
- Visual alert indicators for:
- DLQ > 100 jobs
- Failure rate > 5%
- Queue backlog > 10,000
- No active workers
- Server health
Alert Rules
Pre-configured Prometheus alerts in monitoring/alert_rules.yml:
| Alert | Condition | Severity |
|---|---|---|
BunqueueDLQHigh | DLQ > 100 for 5m | critical |
BunqueueHighFailureRate | Failure > 5% for 5m | warning |
BunqueueQueueBacklog | Waiting > 10k for 10m | warning |
BunqueueNoWorkers | 0 workers + waiting jobs | critical |
BunqueueServerDown | Server unreachable | critical |
BunqueueLowThroughput | < 1 job/s for 10m | warning |
BunqueueWorkerOverload | Utilization > 95% | warning |
BunqueueJobsStuck | Active jobs, no completions | warning |
Example Alert Rule
- alert: BunqueueDLQHigh expr: bunqueue_jobs_dlq > 100 for: 5m labels: severity: critical annotations: summary: "High number of jobs in DLQ" description: "{{ $value }} jobs are in the dead letter queue."CLI Metrics
View metrics from the command line:
# JSON formatbunqueue metrics
# Prometheus formatbunqueue metrics --format prometheus
# Server statsbunqueue statsHealth Endpoints
bunqueue provides Kubernetes-compatible health endpoints:
# Detailed health (includes memory stats)curl http://localhost:6790/health
# Kubernetes liveness probecurl http://localhost:6790/healthz
# Kubernetes readiness probecurl http://localhost:6790/readyDebug Endpoints
For troubleshooting:
# Heap statisticscurl http://localhost:6790/heapstats
# Force garbage collectioncurl -X POST http://localhost:6790/gcFile Structure
monitoring/├── prometheus.yml # Prometheus config├── alert_rules.yml # Alert definitions└── grafana/ ├── provisioning/ │ ├── datasources/ # Auto-configure Prometheus │ └── dashboards/ # Auto-load dashboards └── dashboards/ └── bunqueue.json # Complete dashboardCustom Dashboards
Import the dashboard JSON directly:
- Open Grafana → Dashboards → Import
- Upload
monitoring/grafana/dashboards/bunqueue.json - Select Prometheus datasource
- Click Import
Best Practices
- Scrape interval: Use 5-15 seconds for real-time visibility
- Retention: Keep 15+ days for trend analysis
- Alerts: Start with the included rules, tune thresholds for your workload
- Labels: Consider adding custom labels for multi-queue environments