Skip to content

Pause data migrations upon active autovacuum

Andreas Brandl requested to merge ab/pause-on-vacuum into master

What does this MR do and why?

This change adds the ability to react to an active autovacuum process and put a related data migration on hold for a while. The idea is to reduce pressure on the system by putting migrations on hold until the system "recovers".

In general, we have these components:

  1. Indicator: An indicator examines the system state and returns a Signal
  2. Signal: Signals a desired action, e.g. "stop migration", or "all good"
  3. HealthStatus: This module is being called by the migration runner after every job execution. It evaluates indicators and returns signal, based on this the migration runner makes a decision to adapt migration parameters (put on hold or optimize).

In this case, detecting an active autovacuum process is an indicator. If autovacuum is active on the same table the migration works on, it signals a "stop migration". This results in putting the migration on hold for 10 minutes.

With Pause batched background migration when WAL pen... (!84555 - merged), which depends on this one, we're going to add a second indicator (WAL components pending to be archived), and update the code to support multiple indicators.

See #357248 (closed).

How to set up and validate locally

Numbered steps to set up and validate the change are strongly suggested.

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Krasimir Angelov

Merge request reports