Analyze risks of individual tuning of autovacuum for issues, notes, merge_requests table – disk IO, CPU
Continued from: production#5809 (comment 716494568)
There is a proposal from @msmiley to tune the intensity of autovacuum activities for individual tables – issues
, notes
, and merge_requests
:
/* Unthrottle autovacuum */
ALTER TABLE <table_name> SET ( autovacuum_vacuum_cost_delay = 0 ) ;
/* Reduce the size of gin pending lists */
ALTER INDEX <index_name> SET ( gin_pending_list_limit = 2097152 ) ;
to get rid of long-running (dozens of minutes, sometimes more than 1h) autoANALYZE for these tables – such events hurt database performance because of xmin age spikes leading to a) the risks of higher table growth, and b) SubtransControlLock on standbys.
Previous discussions:
- production#5809 (comment 716106440)
- slack (internal)
- another proposal – a plpgsql function performing unthrottled ANALYZE (w/o VACUUM) right before autoANALYZE would happen: https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/14360#note_699607687
This issue is created to analyze risks of such tuning. It is impossible to make only autoANALYZE part of autovacuum activities unthrottled – if we remove throttling (setting _cost_delay = 0), it means that VACUUM part (the heaviest one) will be unthrottled too, for specific tables. It may lead to higher resource utilization - first of all disk IO (CPU too, but here we don't expect big impact).