PgQ v3.5.1 vs PgQue pre-0.1.0: throughput benchmark with pg_ash + pg-flight-recorder telemetry
## Goal Benchmark original PgQ (v3.5.1, PL-only mode) and PgQue (pre-0.1.0) side by side on the same hardware with: - Identical payload sizes and formats - pg_cron ticker + table rotation running - pg_ash (1s sampling) + pg-flight-recorder telemetry - Performance Insights-style visualization - Sustained runs (10-30 min) to capture checkpoint behavior ## Background LinkedIn discussion about PostgreSQL queue throughput vs RedPanda prompted benchmarking PgQ and PgQue (the modern repackaging of PgQ as an anti-extension). Previous benchmarking (see below) used mismatched payloads in some comparisons: - PgQue `send()` used ~1 KiB jsonb - PgQ `insert_event()` used ~2 KiB text - This is **not** apples-to-apples ## What we have so far ### Hardware - Apple Silicon, 10 cores, 24 GiB RAM, APFS SSD - PostgreSQL 18.3 (Homebrew) ### Software versions - **PgQ:** 3.5.1 (PL-only mode via `pgq_pl_only.sql`, no C extension) - **PgQue:** pre-0.1.0 (main branch as of 2026-04-14) - **pg_ash:** latest (1-second wait event sampling) - **pg-flight-recorder:** latest (WAL, checkpoint, I/O snapshots) ### Tuning applied ```sql synchronous_commit = off shared_buffers = 2GB max_wal_size = 4GB wal_level = minimal wal_compression = lz4 bgwriter_delay = 50ms bgwriter_lru_maxpages = 400 bgwriter_lru_multiplier = 4.0 ``` ### Results so far **10-min run, ~2 KiB payloads, same format, 8 clients (fair comparison):** | API | 10-min avg ev/s | vs PgQ | |-----|----------------|--------| | PgQ `insert_event()` | 83,294 | baseline | | PgQue `insert_event()` | 73,278 | -12% | | PgQue `send()` (~1 KiB jsonb, NOT apples-to-apples) | 81,242 | -2% | **15-min run with telemetry:** | API | 15-min avg ev/s | |-----|----------------| | PgQue `send()` (~1 KiB jsonb) | 81,691 | | PgQ `insert_event()` (~2 KiB text) | 70,871 | **Wait event profile (pg_ash, 30 min combined):** | Wait type | % | Color | |-----------|---|-------| | CPU (on CPU) | 49.3% | Green — useful work | | Client:ClientRead | 35.5% | Yellow — pgbench overhead | | IO:DataFileWrite | 6.2% | Blue — checkpoint I/O | | Lock:extend | 4.0% | Red — relation extension | | LWLock:BufferContent | 2.7% | Pink — buffer contention | **Key finding:** bgwriter tuning (delay=50ms, maxpages=400) reduced IO:DataFileWrite from 57% to 6.2%. ### Performance Insights-style ASH chart [Available in PR](https://github.com/NikolayS/pgque/pull/44) — shows 1-second AAS stacked by wait type with pg_ash color scheme. ### PG core optimization patches (separate effort) 3 patches tested on PG 19dev branch [`pgq-perf-experiments`](https://github.com/NikolayS/postgres/tree/pgq-perf-experiments): 1. NUM_XLOGINSERT_LOCKS 8→32 (+5-12%) 2. BulkInsertState for executor INSERT (+5-10%) 3. Multi-target blocks, 4 slots (+31-45% combined) ## TODO - [ ] Rerun with identical ~2 KiB text payloads for PgQ and PgQue `insert_event()` — true apples-to-apples - [ ] Rerun with identical ~1 KiB jsonb payloads for PgQue `send()` — test with `pgq.insert_event()` using same-size text - [ ] 30-min sustained runs with pg_ash + pgfr - [ ] Test on Linux (io_uring, CPU pinning, commit_delay) - [ ] Test on cloud instances (RDS, Cloud SQL) — the actual pgque target - [ ] Consumer throughput benchmarks (not just producer) - [ ] Concurrent producer + consumer - [ ] Compare with PGMQ and River under same conditions ## Related - PgQ benchmark issue: https://github.com/NikolayS/pgq/issues/1 - PgQue benchmark issue: https://github.com/NikolayS/pgque/issues/45 - PgQue benchmark PR: https://github.com/NikolayS/pgque/pull/44 - PG core patches: https://github.com/NikolayS/postgres/tree/pgq-perf-experiments - LinkedIn discussion: https://www.linkedin.com/feed/update/urn:li:activity:7449044491883159552
issue