Add helper to check for wraparound vacuum processes
What does this MR do and why?
Related to gitlab-com/gl-infra/production#8588 (closed)
Autovacuum to prevent wraparound on ci_builds table takes around 10 hours in production and it can block migrations that try to lock the table at the same time. This helper can be used to fail the migration early with a more helpful error message:
class BlockCiBuilds < Gitlab::Database::Migration[2.1]
enable_lock_retries!
def up
check_if_wraparound_in_progress(:ci_builds)
change_column_default :ci_builds, :partition_id, from: 100, to: nil
end
end
Would log a message similar to this for each lock iteration if the vacuum is running:
> bin/rails db:migrate:ci RAILS_ENV=test
ci: == [advisory_lock_connection] object_id: 270380, pg_backend_pid: 63282
ci: == 20230413121736 BlockCiBuilds: migrating ====================================
ci: -- Autovacuum with wraparound prevention mode is running on `ci_builds`
ci: -> This process prevents the migration from acquiring the necessary locks
ci: -> Query: `autovacuum: VACUUM public.ci_builds (to prevent wraparound)`
ci: -> Current duration: "2 hours, 31 minutes, and 12.316804 seconds"
ci: -> Process id: 178
ci: -> You can wait until it completes or if absolutely necessary interrupt it using: `select pg_cancel_backend(178);`
ci: -> Be aware that a new process will kick in immediately, so multiple interruptions might be required to time right it with the locks retry mechanism
ci: -- change_column_default(:ci_builds, :partition_id, {:from=>100, :to=>nil})
ci: == 20230413121736 BlockCiBuilds: migrating ====================================
ci: -- Autovacuum with wraparound prevention mode is running on `ci_builds`
ci: -> This process prevents the migration from acquiring the necessary locks
ci: -> Query: `autovacuum: VACUUM public.ci_builds (to prevent wraparound)`
ci: -> Current duration: "2 hours, 31 minutes, and 12.316804 seconds"
ci: -> Process id: 178
ci: -> You can wait until it completes or if absolutely necessary interrupt it using: `select pg_cancel_backend(178);`
ci: -> Be aware that a new process will kick in immediately, so multiple interruptions might be required to time right it with the locks retry mechanism
ci: -- change_column_default(:ci_builds, :partition_id, {:from=>100, :to=>nil})
ci: == 20230413121736 BlockCiBuilds: migrating ====================================
ci: -- Autovacuum with wraparound prevention mode is running on `ci_builds`
ci: -> This process prevents the migration from acquiring the necessary locks
ci: -> Query: `autovacuum: VACUUM public.ci_builds (to prevent wraparound)`
ci: -> Current duration: "2 hours, 31 minutes, and 12.316804 seconds"
ci: -> Process id: 178
ci: -> You can wait until it completes or if absolutely necessary interrupt it using: `select pg_cancel_backend(178);`
ci: -> Be aware that a new process will kick in immediately, so multiple interruptions might be required to time right it with the locks retry mechanism
ci: -- change_column_default(:ci_builds, :partition_id, {:from=>100, :to=>nil})
How to set up and validate locally
- create a thin clone
- change vacuum settings to a lower value:
alter table ci_builds set(autovacuum_freeze_max_age = 224678100);
- update a record to start the vacuum
update ci_builds set finished_at = now() where id = 4024623537;
- check that the vacuum was started:
select * from pg_stat_activity where query ilike '%ci_builds%wraparound%' and backend_type ilike 'autovacuum%';
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
Edited by Marius Bobin