Pause migration while autovacuum is running for the table

Extracted from #353395 (comment 892991285)

Overview

Indicator: Active autovacuum on the table the migration works on (yes/no)
Source: query primary on pg_stat_activity
Action: Pause migration while autovacuum is running on the same table the migration works on
Needs prometheus: No (see Caveat)

This is a higher level, table-level tunable indicator which ideally is already tuned to sane levels on the system side (to achieve good autovacuum results in the first place). If there is an autovacuum going on, it can be seen as an indicator of a high rate of churn and we would pause further updates until the autovacuum has finished.

Caveat ❗

Querying pg_stat_activity is currently not possible on .com for the gitlab user (permission denied). We will have to grant permissions or work around this (e.g. with a custom function or an alternative way of getting this information).

Alternatively, this is also available through prometheus: max(pg_stat_activity_autovacuum_age_in_seconds{env="gprd"}) by (relname)

Discussion

In particular with large tables, this means that we'll pause data migration perhaps for many hours until the autovacuum run has finished. This gives priority to regular application-side updates.

The caveat here is that too large tables can lead to too long autovacuum timings which in turn would dramatically reduce the throughput for data migrations. This is specific to GitLab.com and we can see how we go about this once we have it implemented and feature flagged. In any case the problem is the large tables and long autovacuum times (which has many more implications for database health). Slowing migrations down for these cases is another expression that large tables are a problem.

Out of scope

Detect whether autovacuum would be necessary (system side problem) - may be a follow up
Detect whether all autovacuum workers are busy (system side problem) - seems unnecessary (current capacity used) and is well monitored

Edited Mar 30, 2022 by Andreas Brandl