Skip to content

Add helper to check for wraparound vacuum processes

Marius Bobin requested to merge mb-wraparound-vacuum into master

What does this MR do and why?

Related to gitlab-com/gl-infra/production#8588 (closed)

Autovacuum to prevent wraparound on ci_builds table takes around 10 hours in production and it can block migrations that try to lock the table at the same time. This helper can be used to fail the migration early with a more helpful error message:

class BlockCiBuilds < Gitlab::Database::Migration[2.1]
  enable_lock_retries!

  def up
    check_if_wraparound_in_progress(:ci_builds)
    change_column_default :ci_builds, :partition_id, from: 100, to: nil
  end
end

Would log a message similar to this for each lock iteration if the vacuum is running:

> bin/rails db:migrate:ci RAILS_ENV=test
ci: == [advisory_lock_connection] object_id: 270380, pg_backend_pid: 63282
ci: == 20230413121736 BlockCiBuilds: migrating ====================================
ci: -- Autovacuum with wraparound prevention mode is running on `ci_builds`
ci:    -> This process prevents the migration from acquiring the necessary locks
ci:    -> Query: `autovacuum: VACUUM public.ci_builds (to prevent wraparound)`
ci:    -> Current duration: "2 hours, 31 minutes, and 12.316804 seconds"
ci:    -> Process id: 178
ci:    -> You can wait until it completes or if absolutely necessary interrupt it using: `select pg_cancel_backend(178);`
ci:    -> Be aware that a new process will kick in immediately, so multiple interruptions might be required to time right it with the locks retry mechanism
ci: -- change_column_default(:ci_builds, :partition_id, {:from=>100, :to=>nil})
ci: == 20230413121736 BlockCiBuilds: migrating ====================================
ci: -- Autovacuum with wraparound prevention mode is running on `ci_builds`
ci:    -> This process prevents the migration from acquiring the necessary locks
ci:    -> Query: `autovacuum: VACUUM public.ci_builds (to prevent wraparound)`
ci:    -> Current duration: "2 hours, 31 minutes, and 12.316804 seconds"
ci:    -> Process id: 178
ci:    -> You can wait until it completes or if absolutely necessary interrupt it using: `select pg_cancel_backend(178);`
ci:    -> Be aware that a new process will kick in immediately, so multiple interruptions might be required to time right it with the locks retry mechanism
ci: -- change_column_default(:ci_builds, :partition_id, {:from=>100, :to=>nil})
ci: == 20230413121736 BlockCiBuilds: migrating ====================================
ci: -- Autovacuum with wraparound prevention mode is running on `ci_builds`
ci:    -> This process prevents the migration from acquiring the necessary locks
ci:    -> Query: `autovacuum: VACUUM public.ci_builds (to prevent wraparound)`
ci:    -> Current duration: "2 hours, 31 minutes, and 12.316804 seconds"
ci:    -> Process id: 178
ci:    -> You can wait until it completes or if absolutely necessary interrupt it using: `select pg_cancel_backend(178);`
ci:    -> Be aware that a new process will kick in immediately, so multiple interruptions might be required to time right it with the locks retry mechanism
ci: -- change_column_default(:ci_builds, :partition_id, {:from=>100, :to=>nil})

How to set up and validate locally

  1. create a thin clone
  2. change vacuum settings to a lower value: alter table ci_builds set(autovacuum_freeze_max_age = 224678100);
  3. update a record to start the vacuum update ci_builds set finished_at = now() where id = 4024623537;
  4. check that the vacuum was started: select * from pg_stat_activity where query ilike '%ci_builds%wraparound%' and backend_type ilike 'autovacuum%';

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Marius Bobin

Merge request reports