Respond to DB health in background migrations
What does this MR do?
This MR changes BackgroundMigrationWorker
so it backs off if the database is in an unhealthy state. This brings us two benefits:
- We don't add additional pressure on the database if it's already overloaded.
- Because of the extra safety check, we can lower the minimum interval to 2 minutes (= 2x the vacuuming interval). This allows simple migrations to complete about 2.5 faster, while big migrations won't be able to blow up the DB.
Combined this should allow us to more comfortably (and slightly faster) migrate large tables.
Why was this MR needed?
Background migrations had an interval of 5 minutes, but never bothered to check the state of the system. This could result in a migration running, while the database was still suffering from severe replication lag.
Does this MR meet the acceptance criteria?
-
Changelog entry added, if necessary -
Documentation created/updated -
API support added -
Tests added for this feature/bug - Conforms to the code review guidelines
-
Has been reviewed by a Backend maintainer -
Has been reviewed by a Database specialist
-
-
Conforms to the merge request performance guidelines -
Conforms to the style guides -
Conforms to the database guides -
If you have multiple commits, please combine them into a few logically organized commits by squashing them -
End-to-end tests pass ( package-and-qa
manual pipeline job)
TODO
-
Consider using theWe'll just start with two minutes. On GitLab.com the naptime is 1 minute, which I fear is too small as we might start running new migrations when vacuuming is still running.autovacuum_naptime
setting for the minimum, instead of hard coding this to two minutes. -
Add documentation -
Lower the minimum scheduling interval to 2 minutes in the database migration helpers -
Backport WAL/LSN functions from EE to CE
Edited by Yorick Peterse