Autovacuum considerations

Overview

Autovacuum is a standard mechanism based on a scheduled vacuum triggered by thresholds, to remove dead tuples left over by PostgreSQL during modification of a table in order to prevent bloat. As mentioned in a previous analysis (https://gitlab.com/gitlab-com/infrastructure/issues/4650), autovacuum is consuming 75% of the read I/O and and preforming less than 3% of write I/O for block removing or marking.

The recommendation here is to disable autovacuum for avoiding too frequent executions during busiest hours, which are considered excessive considering the small portion of block mark and cleaning. The larger the relations are, the harder is to accurately set a threshold on top of which we have less bloat without doing constant scanning of the relation. The common practice in newer versions is to partition the tables, so we segment better and vacuums do not need to run on the entire set.

The current analysis of autovacuum's behaviour was backed by stats, logs and prometheus configuration. It's been analyzed before and after the migration with the applied configuration changes. At the end, there are suggest custom configuration on how autovacuum configuration could be changed and monitored to improve the current behaviour.

Before migration status

Before migration we had the following situation:

With following autovacuum and autoanalyze daily count stats:

Stat	min	50,00 %	90,00 %	95,00 %	99,00 %	99.99%	max
autovacuum	205	347	423	449	506	530	531
autoanalyze	89	224	299	324	349	385	386

Autovacuum was never going under 200 daily and it was due to very low thresholds in the autovacuum configuration:

autovacuum_naptime = 1min # default 1min
autovacuum_vacuum_threshold = 50 # default 50
autovacuum_analyze_threshold = 50 # default 50
autovacuum_vacuum_scale_factor = 0.005 # default 0.2
autovacuum_analyze_scale_factor = 0.005 # default 0.1
autovacuum_vacuum_cost_delay = 5ms # default 20ms
autovacuum_vacuum_cost_limit = 6000 # default 200

And in some cases the threshold is inexistent, autovacuum_vacuum_scale_factor and autovacuum_analyze_scale_factor set to 0 with a autovacuum_vacuum_threshold and autovacuum_analyze_threshold set to 5000 and 2500 respectively (irrelevant compared to the size of some tables) for the following tables:

project_ci_cd_settings, ci_builds_metadata, ci_builds_metadata, projects, ci_pipelines, ci_builds, project_repository_states, project_mirror_data, merge_request_diff_files, routes, geo_repository_updated_events, ci_job_artifacts, user_interacted_projects, web_hook_logs, project_authorizations, ci_build_trace_sections, notification_settings, ci_stages, uploads, merge_requests, project_statistics, issue_assignees, geo_event_log, keys, merge_request_diff_commits, events, push_event_payloads, system_note_metadata, issue_metrics, issues, label_links, labels, lfs_objects, lfs_objects_projects, merge_request_diffs, members, merge_request_metrics, namespaces, notes, personal_access_tokens, project_features, project_import_data, push_rules, protected_branch_merge_access_levels, protected_branch_push_access_levels, protected_branches, services, taggings, todos, user_agent_details, users

The scale factor for both autovacuum and autoanalyze is too low or not existent and make autovacuum worker fire vacuum (and, in a minor scale, analyze) too frequently for too many of the tables.

Current status

After migration the situation is a bit better with a rough improvement of 20% less vacuum at the 50 percentile:

With following autovacuum and autoanalyze daily count stats:

Stat	min	50,00 %	90,00 %	95,00 %	99,00 %	99.99%	max
autovacuum	56	198	307	481	673	743	745
autoanalyze	41	121	191	254	282	350	351

The improvement is related with the removal of specific autovacuum configuration from tables, leaving a global scale factor of 0.005 for all the tables. For some smaller or not too active (less dead tuple produced) tables this could be probably improved. Currently we have pg_stat_all_table.n_dead_tup data in prometheus so we can have a look at the evolution in a 2 weeks window:

As an example of the issue been described, is about the table merge_request_diff_files (~ 500M tuples) that reached around 2M tuples (~ 0.004) and it was autovacuumed each 2-4 days. Its growth rate and the count of autovacuum since migration until today is 4, Taking around 6 minutes to complete with a total of 7M tuples removed. Comparing this data from the beginning of the month until migration (less than 2 weeks), it shows that autovacuum count is of 1301 with a total of 1.579M tuple removed. So for this concrete table scale factor of 0.005 seems to fit nicely.

Table merge_request_diff_commits (~ 1018M tuples) has (currently) a quite uncostant pattern with a huge spike of 2.5M tuples in an hour and a smoth growth of 800k tuples in 11 days. The table is autovacuumed each 4-11 days. Since migration we had 2 autovacuum that took both around 10 minutes to complete with a total of 3.28M tuples removed.

Candidate tables for custom configuration

Following a list of all table that can be considered candidate for maintain current autovacuum scale factor. The criterium used to chose those tables among others is that the mean number of autovacuum is not more than 1 per day and the mean num of tuples that are removed is greater or equals to 125k:

Table	Toast of table	Autovacuum count	Autovacuum mean duration	Autovacuum mean removed tuples	Autovacuum mean % of tuples removed (# total table tuples)
merge_request_diff_files		4	7 min	7M	1,7% (413M)
merge_request_diff_commits		2	10 min	3M	0,32% (1019M)
pg_toast.pg_toast_21971303	merge_request_diff_files	3	3 min	3M	1,5% (170M)
ci_stages		4	2 min	547k	1,32% (42M)
ci_builds_metadata		6	21,52 sec	539k	1,94% (28M)
merge_request_diffs		11	34,08 sec	337k	1,66% (20M)
events		2	10 min	321k	0,16% (197M)
push_event_payloads		1	5 min	294k	0,19% (153M)
geo_repository_updated_events		1	3 min	224k	0,32% (70M)
notes		2	24 min	183k	0,22% (82M)
issue_metrics		5	6,37 sec	148k	1,92% (8M)
routes		5	21,99 sec	126k	1,39% (9M)

The autovacuum_vacuum_scale_factor and autovacuum_analyze_scale_factor should be set specifically to 0.005 for those candidate tables. For other table we could raise then the global autovacuum_vacuum_scale_factor and autovacuum_analyze_scale_factor to 0.01 having a look at pg_stat_all_table.n_dead_tup, to get an estimation of dead tuples, and pg_stat_all_table.autovacuum_count. Traking them in a day by day basis and stop when we found a good balance between number of autovacuum done per table and percentage of dead tuples (compared to the size of the table) removed. Then we can repeat this steps until we reach the more conservative scale factor of 0.20 (0.10 for analysis) that is more suitable for small tables with less than 1M tuples and with not much activity (not much dead tuple produced).