Subtransactions and performance degradation, part 2: issues on replicas while a long-running transaction lasts on the primary
In #20 (closed), it was explored how having 65+ subtransactions in a transaction can be harmful for performance on a single node ("Cybertec case", https://www.cybertec-postgresql.com/en/subtransactions-and-performance-in-postgresql/)
Here the goal is to build a synthetic reproducible benchmark that shows how performance degrades on a replica while there is an ongoing long-running transaction on the primary and many transactions happened containing low numbers of subtransactions -- ("GitLab case", gitlab-org/gitlab#338410 (closed))
See also
- gitlab-org/gitlab#338346 (comment 652623314)
-
SUBTRANS_XACTS_PER_PAGE
https://github.com/postgres/postgres/blob/4bf0bce161097869be5a56706b31388ba15e0113/src/backend/access/transam/subtrans.c#L52.NUM_SUBTRANS_BUFFERS
is 32, each page is 8 KiB, XID size is 4 bytes -- this gives 8192/4 = 2048 XIDs per page, overall 2048 * 32 = 65536 values -- this defines the age of the long-running transaction that leads to SLRU overflow
TODO
-
check Postgres versions ( SubtransControlLock
was renamed toSubtransSLRU
in PG13):-
12 -
13 -
14
-
-
compare workloads that use SAVEPOINTs and don't use them at all -
see if "rare use of SAVEPOINTs" is OK -
alternatives to long-running transaction: ANALYZE, VACUUM, long-running transaction on replica with hot_standby_feedback=on
-
can RELEASEs help? // it looks like NO -
does this problem affect the primary? (SELECTs on the primary, 1-node case) // it looks like NO -
check pg_stat_slru
while the problem is happening (PG13+) -
test Andrey's patches https://www.postgresql.org/message-id/flat/494C5E7F-E410-48FA-A93E-F7723D859561%40yandex-team.ru#18c79477bf7fc44a3ac3d1ce55e4c169, https://commitfest.postgresql.org/34/2627/
Edited by Nikolay Samokhvalov