PgQ v3.5.1 vs PgQue pre-0.1.0: throughput benchmark with pg_ash + pg-flight-recorder telemetry
## Goal
Benchmark original PgQ (v3.5.1, PL-only mode) and PgQue (pre-0.1.0) side by side on the same hardware with:
- Identical payload sizes and formats
- pg_cron ticker + table rotation running
- pg_ash (1s sampling) + pg-flight-recorder telemetry
- Performance Insights-style visualization
- Sustained runs (10-30 min) to capture checkpoint behavior
## Background
LinkedIn discussion about PostgreSQL queue throughput vs RedPanda prompted benchmarking PgQ and PgQue (the modern repackaging of PgQ as an anti-extension).
Previous benchmarking (see below) used mismatched payloads in some comparisons:
- PgQue `send()` used ~1 KiB jsonb
- PgQ `insert_event()` used ~2 KiB text
- This is **not** apples-to-apples
## What we have so far
### Hardware
- Apple Silicon, 10 cores, 24 GiB RAM, APFS SSD
- PostgreSQL 18.3 (Homebrew)
### Software versions
- **PgQ:** 3.5.1 (PL-only mode via `pgq_pl_only.sql`, no C extension)
- **PgQue:** pre-0.1.0 (main branch as of 2026-04-14)
- **pg_ash:** latest (1-second wait event sampling)
- **pg-flight-recorder:** latest (WAL, checkpoint, I/O snapshots)
### Tuning applied
```sql
synchronous_commit = off
shared_buffers = 2GB
max_wal_size = 4GB
wal_level = minimal
wal_compression = lz4
bgwriter_delay = 50ms
bgwriter_lru_maxpages = 400
bgwriter_lru_multiplier = 4.0
```
### Results so far
**10-min run, ~2 KiB payloads, same format, 8 clients (fair comparison):**
| API | 10-min avg ev/s | vs PgQ |
|-----|----------------|--------|
| PgQ `insert_event()` | 83,294 | baseline |
| PgQue `insert_event()` | 73,278 | -12% |
| PgQue `send()` (~1 KiB jsonb, NOT apples-to-apples) | 81,242 | -2% |
**15-min run with telemetry:**
| API | 15-min avg ev/s |
|-----|----------------|
| PgQue `send()` (~1 KiB jsonb) | 81,691 |
| PgQ `insert_event()` (~2 KiB text) | 70,871 |
**Wait event profile (pg_ash, 30 min combined):**
| Wait type | % | Color |
|-----------|---|-------|
| CPU (on CPU) | 49.3% | Green — useful work |
| Client:ClientRead | 35.5% | Yellow — pgbench overhead |
| IO:DataFileWrite | 6.2% | Blue — checkpoint I/O |
| Lock:extend | 4.0% | Red — relation extension |
| LWLock:BufferContent | 2.7% | Pink — buffer contention |
**Key finding:** bgwriter tuning (delay=50ms, maxpages=400) reduced IO:DataFileWrite from 57% to 6.2%.
### Performance Insights-style ASH chart
[Available in PR](https://github.com/NikolayS/pgque/pull/44) — shows 1-second AAS stacked by wait type with pg_ash color scheme.
### PG core optimization patches (separate effort)
3 patches tested on PG 19dev branch [`pgq-perf-experiments`](https://github.com/NikolayS/postgres/tree/pgq-perf-experiments):
1. NUM_XLOGINSERT_LOCKS 8→32 (+5-12%)
2. BulkInsertState for executor INSERT (+5-10%)
3. Multi-target blocks, 4 slots (+31-45% combined)
## TODO
- [ ] Rerun with identical ~2 KiB text payloads for PgQ and PgQue `insert_event()` — true apples-to-apples
- [ ] Rerun with identical ~1 KiB jsonb payloads for PgQue `send()` — test with `pgq.insert_event()` using same-size text
- [ ] 30-min sustained runs with pg_ash + pgfr
- [ ] Test on Linux (io_uring, CPU pinning, commit_delay)
- [ ] Test on cloud instances (RDS, Cloud SQL) — the actual pgque target
- [ ] Consumer throughput benchmarks (not just producer)
- [ ] Concurrent producer + consumer
- [ ] Compare with PGMQ and River under same conditions
## Related
- PgQ benchmark issue: https://github.com/NikolayS/pgq/issues/1
- PgQue benchmark issue: https://github.com/NikolayS/pgque/issues/45
- PgQue benchmark PR: https://github.com/NikolayS/pgque/pull/44
- PG core patches: https://github.com/NikolayS/postgres/tree/pgq-perf-experiments
- LinkedIn discussion: https://www.linkedin.com/feed/update/urn:li:activity:7449044491883159552
issue