Autovacuum considerations
Overview
Autovacuum is a standard mechanism based on a scheduled vacuum triggered by thresholds, to remove dead tuples left over by PostgreSQL during modification of a table in order to prevent bloat. As mentioned in a previous analysis (https://gitlab.com/gitlab-com/infrastructure/issues/4650), autovacuum is consuming 75% of the read I/O and and preforming less than 3% of write I/O for block removing or marking.
The recommendation here is to disable autovacuum for avoiding too frequent executions during busiest hours, which are considered excessive considering the small portion of block mark and cleaning. The larger the relations are, the harder is to accurately set a threshold on top of which we have less bloat without doing constant scanning of the relation. The common practice in newer versions is to partition the tables, so we segment better and vacuums do not need to run on the entire set.
The current analysis of autovacuum's behaviour was backed by stats, logs and prometheus configuration. It's been analyzed before and after the migration with the applied configuration changes. At the end, there are suggest custom configuration on how autovacuum configuration could be changed and monitored to improve the current behaviour.
Before migration status
Before migration we had the following situation:
With following autovacuum and autoanalyze daily count stats:
Stat | min | 50,00 % | 90,00 % | 95,00 % | 99,00 % | 99.99% | max |
---|---|---|---|---|---|---|---|
autovacuum | 205 | 347 | 423 | 449 | 506 | 530 | 531 |
autoanalyze | 89 | 224 | 299 | 324 | 349 | 385 | 386 |
Autovacuum was never going under 200 daily and it was due to very low thresholds in the autovacuum configuration:
autovacuum_naptime = 1min # default 1min
autovacuum_vacuum_threshold = 50 # default 50
autovacuum_analyze_threshold = 50 # default 50
autovacuum_vacuum_scale_factor = 0.005 # default 0.2
autovacuum_analyze_scale_factor = 0.005 # default 0.1
autovacuum_vacuum_cost_delay = 5ms # default 20ms
autovacuum_vacuum_cost_limit = 6000 # default 200
And in some cases the threshold is inexistent, autovacuum_vacuum_scale_factor
and autovacuum_analyze_scale_factor
set to 0 with a autovacuum_vacuum_threshold
and autovacuum_analyze_threshold
set to 5000 and 2500 respectively (irrelevant compared to the size of some tables) for the following tables:
project_ci_cd_settings, ci_builds_metadata, ci_builds_metadata, projects, ci_pipelines, ci_builds, project_repository_states, project_mirror_data, merge_request_diff_files, routes, geo_repository_updated_events, ci_job_artifacts, user_interacted_projects, web_hook_logs, project_authorizations, ci_build_trace_sections, notification_settings, ci_stages, uploads, merge_requests, project_statistics, issue_assignees, geo_event_log, keys, merge_request_diff_commits, events, push_event_payloads, system_note_metadata, issue_metrics, issues, label_links, labels, lfs_objects, lfs_objects_projects, merge_request_diffs, members, merge_request_metrics, namespaces, notes, personal_access_tokens, project_features, project_import_data, push_rules, protected_branch_merge_access_levels, protected_branch_push_access_levels, protected_branches, services, taggings, todos, user_agent_details, users
The scale factor for both autovacuum and autoanalyze is too low or not existent and make autovacuum worker fire vacuum (and, in a minor scale, analyze) too frequently for too many of the tables.
Current status
After migration the situation is a bit better with a rough improvement of 20% less vacuum at the 50 percentile:
With following autovacuum and autoanalyze daily count stats:
Stat | min | 50,00 % | 90,00 % | 95,00 % | 99,00 % | 99.99% | max |
---|---|---|---|---|---|---|---|
autovacuum | 56 | 198 | 307 | 481 | 673 | 743 | 745 |
autoanalyze | 41 | 121 | 191 | 254 | 282 | 350 | 351 |
The improvement is related with the removal of specific autovacuum configuration from tables, leaving a global scale factor of 0.005 for all the tables. For some smaller or not too active (less dead tuple produced) tables this could be probably improved. Currently we have pg_stat_all_table
.n_dead_tup
data in prometheus so we can have a look at the evolution in a 2 weeks window:
As an example of the issue been described, is about the table merge_request_diff_files
(~ 500M tuples) that reached around 2M tuples (~ 0.004) and it was autovacuumed each 2-4 days. Its growth rate and the count of autovacuum since migration until today is 4, Taking around 6 minutes to complete with a total of 7M tuples removed. Comparing this data from the beginning of the month until migration (less than 2 weeks), it shows that autovacuum count is of 1301 with a total of 1.579M tuple removed. So for this concrete table scale factor of 0.005 seems to fit nicely.
Table merge_request_diff_commits
(~ 1018M tuples) has (currently) a quite uncostant pattern with a huge spike of 2.5M tuples in an hour and a smoth growth of 800k tuples in 11 days. The table is autovacuumed each 4-11 days. Since migration we had 2 autovacuum that took both around 10 minutes to complete with a total of 3.28M tuples removed.
Candidate tables for custom configuration
Following a list of all table that can be considered candidate for maintain current autovacuum scale factor. The criterium used to chose those tables among others is that the mean number of autovacuum is not more than 1 per day and the mean num of tuples that are removed is greater or equals to 125k:
Table | Toast of table | Autovacuum count | Autovacuum mean duration | Autovacuum mean removed tuples | Autovacuum mean % of tuples removed (# total table tuples) |
---|---|---|---|---|---|
merge_request_diff_files | 4 | 7 min | 7M | 1,7% (413M) | |
merge_request_diff_commits | 2 | 10 min | 3M | 0,32% (1019M) | |
pg_toast.pg_toast_21971303 | merge_request_diff_files | 3 | 3 min | 3M | 1,5% (170M) |
ci_stages | 4 | 2 min | 547k | 1,32% (42M) | |
ci_builds_metadata | 6 | 21,52 sec | 539k | 1,94% (28M) | |
merge_request_diffs | 11 | 34,08 sec | 337k | 1,66% (20M) | |
events | 2 | 10 min | 321k | 0,16% (197M) | |
push_event_payloads | 1 | 5 min | 294k | 0,19% (153M) | |
geo_repository_updated_events | 1 | 3 min | 224k | 0,32% (70M) | |
notes | 2 | 24 min | 183k | 0,22% (82M) | |
issue_metrics | 5 | 6,37 sec | 148k | 1,92% (8M) | |
routes | 5 | 21,99 sec | 126k | 1,39% (9M) |
The autovacuum_vacuum_scale_factor
and autovacuum_analyze_scale_factor
should be set specifically to 0.005 for those candidate tables. For other table we could raise then the global autovacuum_vacuum_scale_factor
and autovacuum_analyze_scale_factor
to 0.01 having a look at pg_stat_all_table
.n_dead_tup
, to get an estimation of dead tuples, and pg_stat_all_table
.autovacuum_count
. Traking them in a day by day basis and stop when we found a good balance between number of autovacuum done per table and percentage of dead tuples (compared to the size of the table) removed. Then we can repeat this steps until we reach the more conservative scale factor of 0.20 (0.10 for analysis) that is more suitable for small tables with less than 1M tuples and with not much activity (not much dead tuple produced).