PostgreSQL maintenance_work_mem needs to be tuned for larger workloads
Summary
Certain maintenance operations like autovacuum
need working memory, and by default they use the maintenance_work_mem
limit of 64mb (PostgreSQL) / 16mb (Omnibus)
More memory allows for potentially more efficient work, much like with client queries (which use work_mem
) but unlike client queries, maintenance_work_mem
can be set higher; from the PostgreSQL docs
Since only one of these operations can be executed at a time by a database session, and an installation normally doesn't have many of them running concurrently, it's safe to set this value significantly larger than work_mem. Larger settings might improve performance for vacuuming and for restoring database dumps.
I can identify three customer incidents where low maintenance_work_mem
was identified as problematic, or increasing it was recommended to address the situation their environment was in. See references below.
The third case suggests there's a more serious issue on large instances, since autovacuum
was unable to cope with the volume of change that had accumulated on the ci_builds
table. Changes like [a] switching the primary key of ci_builds
from 32bit to 64 bit, or [b] adding partitioning keys updates creates a 'perfect storm' for autovacuum
. Since PostgreSQL operates on a copy-on-write basis, these duplicate every record in the table, creating 'dead tuples' that need to be recovered by autovacuum
, with added impact on the indexes as well.
Based on what I observed, in due course there would have been an outage as the database would have gone into read-only mode. It seemed to be taking an unlimited amount of time to vacuum ci_builds
. What would have happened as the situation persisted (source) is:
- 10 million transactions before the upper limit is reached, WARNING messages consisting of a countdown will be logged.
- 1 million transactions before the upper limit is reached, PostgreSQL goes to READ-ONLY mode.
We know from another customer's experience, that the environment would run through to the final threshold when the database goes read only - gitlab#381443.
Proposal
We tune PostgreSQL shared memory based on the memory on the system; we should also automatically tune maintenance_work_mem
so that as customer environments scale, PostgreSQL is able to deploy those resources to keep the database healthy and perform other operations more quickly:
maintenance operations, such as VACUUM, CREATE INDEX, and ALTER TABLE ADD FOREIGN KEY
There's another discussion in one of the cited issues. (internal link)
I suggest scaling it up (using system RAM) on this basis:
- 8gb, increase the value to 256mb
- 16gb, increase the value to 512mb
- 32gb and higher: 1gb.
There is a case for setting it higher beyond 32gb for non-autovacuum work. 1gb is the limit for autovacuum
- we should get a recommendation from our DBREs prior to implementation, though this isn't essential. 1GB is a significant improvement from the defaults.
This is roughly how this maps to the reference architectures:
arch size | db server memory | proposed mtce work mem |
---|---|---|
1k | shared (7.2GB) | default; 16mb or 64mb |
2k | 7.5 GB | default; 16mb or 64mb |
3k | 7.5 GB | default; 16mb or 64mb |
5k | 15 GB | 256mb |
10k | 30 GB | 512mb |
25k | 60 GB | 1gb |
50k | 120 GB | 1gb |
*Omnibus configures shared memory to 25% of server memory, so from 2K onwards, PostgreSQL has dedicated servers and memory, but the majority of RAM is being used for client work_mem
, with the kernel filling the rest with disk cache (which is itself useful for PostgreSQL)
References
- A customer on a cloud database service; DBREs recommended a higher value in light of apparent issues with
autovacuum
completing, and in light of the size of their database (especiallyci_builds
). Internal link- This thread has a discussion about this specific typefeature
-
Quote from elsewhere in that issue, for future reference:
Note however that vacuum caps it’s memory use at 1 GB; enough to process about 179 million dead tuples in one pass. Vacuuming a table with more dead tuples than that will require multiple passes through the tables indexes, which can cause vacuum to take significantly longer.
-
A ticket (internal link)relating to a second customer with on-prem self-managed Omnibus instance (single node, but very large). Time was of the essence, and we wanted to make sure that a constraint validation could access all the resources it could make use of. It was recommended by one of our DBREs
another potential way to improve the execution plan would be increasing working memory in the db session where they are running the ALTER TABLE ... VALIDATE CONSTRAINT command.
-
A recent emergency (internal link) where an upgrade failed as a
ci_builds
migration could not get an exclusive table lock.- Root cause was found to be a pre-existing lock by a 11-12 day old
autovacuum
operation that was performing ato prevent wraparound
vacuum operation. This type of operation is not optional, and is more forceful then normalautovacuum
. - When we ran it manually, we found that 64mb working memory was only sufficient to tackle (it seemed) one dead tuple for each index on the table, per cycle. Each cycle was taking 30+ minutes interactively, and would have been slower when executed by
autovacuum
. - See quote above, "Vacuuming a table .. will require multiple passes through the tables indexes"
- See also this mailing list post describing the same issue of slow progress vacuuming indexes.
- They had at least hundreds of thousands of dead tuples to recover from the table and indexes, at 30+ minutes per tuple, so this was unlikely to complete.
- Root cause was found to be a pre-existing lock by a 11-12 day old