PostgreSQL maintenance_work_mem needs to be tuned for larger workloads

Summary

Certain maintenance operations like autovacuum need working memory, and by default they use the maintenance_work_mem limit of 64mb (PostgreSQL) / 16mb (Omnibus)

More memory allows for potentially more efficient work, much like with client queries (which use work_mem) but unlike client queries, maintenance_work_mem can be set higher; from the PostgreSQL docs

Since only one of these operations can be executed at a time by a database session, and an installation normally doesn't have many of them running concurrently, it's safe to set this value significantly larger than work_mem. Larger settings might improve performance for vacuuming and for restoring database dumps.

I can identify three customer incidents where low maintenance_work_mem was identified as problematic, or increasing it was recommended to address the situation their environment was in. See references below.

The third case suggests there's a more serious issue on large instances, since autovacuum was unable to cope with the volume of change that had accumulated on the ci_builds table. Changes like [a] switching the primary key of ci_builds from 32bit to 64 bit, or [b] adding partitioning keys updates creates a 'perfect storm' for autovacuum. Since PostgreSQL operates on a copy-on-write basis, these duplicate every record in the table, creating 'dead tuples' that need to be recovered by autovacuum, with added impact on the indexes as well.

Based on what I observed, in due course there would have been an outage as the database would have gone into read-only mode. It seemed to be taking an unlimited amount of time to vacuum ci_builds. What would have happened as the situation persisted (source) is:

10 million transactions before the upper limit is reached, WARNING messages consisting of a countdown will be logged.

1 million transactions before the upper limit is reached, PostgreSQL goes to READ-ONLY mode.

We know from another customer's experience, that the environment would run through to the final threshold when the database goes read only - gitlab#381443.

Proposal

We tune PostgreSQL shared memory based on the memory on the system; we should also automatically tune maintenance_work_mem so that as customer environments scale, PostgreSQL is able to deploy those resources to keep the database healthy and perform other operations more quickly:

maintenance operations, such as VACUUM, CREATE INDEX, and ALTER TABLE ADD FOREIGN KEY

There's another discussion in one of the cited issues. (internal link)

I suggest scaling it up (using system RAM) on this basis:

8gb, increase the value to 256mb
16gb, increase the value to 512mb
32gb and higher: 1gb.

There is a case for setting it higher beyond 32gb for non-autovacuum work. 1gb is the limit for autovacuum - we should get a recommendation from our DBREs prior to implementation, though this isn't essential. 1GB is a significant improvement from the defaults.

This is roughly how this maps to the reference architectures:

arch size	db server memory	proposed mtce work mem
1k	shared (7.2GB)	default; 16mb or 64mb
2k	7.5 GB	default; 16mb or 64mb
3k	7.5 GB	default; 16mb or 64mb
5k	15 GB	256mb
10k	30 GB	512mb
25k	60 GB	1gb
50k	120 GB	1gb

*Omnibus configures shared memory to 25% of server memory, so from 2K onwards, PostgreSQL has dedicated servers and memory, but the majority of RAM is being used for client work_mem, with the kernel filling the rest with disk cache (which is itself useful for PostgreSQL)

References

A customer on a cloud database service; DBREs recommended a higher value in light of apparent issues with autovacuum completing, and in light of the size of their database (especially ci_builds). Internal link
- This thread has a discussion about this specific typefeature
- Quote from elsewhere in that issue, for future reference:
  
  Note however that vacuum caps it’s memory use at 1 GB; enough to process about 179 million dead tuples in one pass. Vacuuming a table with more dead tuples than that will require multiple passes through the tables indexes, which can cause vacuum to take significantly longer.
A ticket (internal link)relating to a second customer with on-prem self-managed Omnibus instance (single node, but very large). Time was of the essence, and we wanted to make sure that a constraint validation could access all the resources it could make use of. It was recommended by one of our DBREs

another potential way to improve the execution plan would be increasing working memory in the db session where they are running the ALTER TABLE ... VALIDATE CONSTRAINT command.
A recent emergency (internal link) where an upgrade failed as a ci_builds migration could not get an exclusive table lock.
- Root cause was found to be a pre-existing lock by a 11-12 day old autovacuum operation that was performing a to prevent wraparound vacuum operation. This type of operation is not optional, and is more forceful then normal autovacuum.
- When we ran it manually, we found that 64mb working memory was only sufficient to tackle (it seemed) one dead tuple for each index on the table, per cycle. Each cycle was taking 30+ minutes interactively, and would have been slower when executed by autovacuum.
- See quote above, "Vacuuming a table .. will require multiple passes through the tables indexes"
- See also this mailing list post describing the same issue of slow progress vacuuming indexes.
- They had at least hundreds of thousands of dead tuples to recover from the table and indexes, at 30+ minutes per tuple, so this was unlikely to complete.

Edited Jan 15, 2024 by Ben Prescott_

Admin message

PostgreSQL maintenance_work_mem needs to be tuned for larger workloads

Summary

Proposal

References