Skip to content

Respond to DB health in background migrations

Yorick Peterse requested to merge background-migrations-system-load into master

What does this MR do?

This MR changes BackgroundMigrationWorker so it backs off if the database is in an unhealthy state. This brings us two benefits:

  1. We don't add additional pressure on the database if it's already overloaded.
  2. Because of the extra safety check, we can lower the minimum interval to 2 minutes (= 2x the vacuuming interval). This allows simple migrations to complete about 2.5 faster, while big migrations won't be able to blow up the DB.

Combined this should allow us to more comfortably (and slightly faster) migrate large tables.

Why was this MR needed?

Background migrations had an interval of 5 minutes, but never bothered to check the state of the system. This could result in a migration running, while the database was still suffering from severe replication lag.

Does this MR meet the acceptance criteria?

TODO

  • Consider using the autovacuum_naptime setting for the minimum, instead of hard coding this to two minutes. We'll just start with two minutes. On GitLab.com the naptime is 1 minute, which I fear is too small as we might start running new migrations when vacuuming is still running.
  • Add documentation
  • Lower the minimum scheduling interval to 2 minutes in the database migration helpers
  • Backport WAL/LSN functions from EE to CE
Edited by Yorick Peterse

Merge request reports