Skip to content

Remove soft removals related code

Yorick Peterse requested to merge remove-soft-removals into master

What does this MR do?

This removes soft removals related code (based on paranoia), fixing the problems outlined in https://gitlab.com/gitlab-org/gitlab-ce/issues/37447.

EE MR: https://gitlab.com/gitlab-org/gitlab-ee/merge_requests/3957

TODO

  • Migration to hard remove any previously soft removed rows
  • Removed the various deleted_at columns
  • Test the migration on staging

Migration Output

From staging:

==  RemoveSoftRemovedObjects: migrating =======================================
-- execute("SET statement_timeout TO 0")
   -> 0.0018s
-- index_exists?("issues", {:name=>"index_on_issues_tmp"})
   -> 0.0231s
-- Creating temporary index index_on_issues_tmp
-- transaction_open?()
   -> 0.0000s
-- execute("SET statement_timeout TO 0")
   -> 0.0048s
-- add_index("issues", [:deleted_at, :id], {:name=>"index_on_issues_tmp", :where=>"deleted_at IS NOT NULL", :algorithm=>:concurrently})
   -> 10.7181s
   -> 10.7232s
-- index_exists?("merge_requests", {:name=>"index_on_merge_requests_tmp"})
   -> 0.0200s
-- Creating temporary index index_on_merge_requests_tmp
-- transaction_open?()
   -> 0.0000s
-- execute("SET statement_timeout TO 0")
   -> 0.0008s
-- add_index("merge_requests", [:deleted_at, :id], {:name=>"index_on_merge_requests_tmp", :where=>"deleted_at IS NOT NULL", :algorithm=>:concurrently})
   -> 15.9529s
   -> 15.9540s
-- index_exists?("ci_pipeline_schedules", {:name=>"index_on_ci_pipeline_schedules_tmp"})
   -> 0.0049s
-- Creating temporary index index_on_ci_pipeline_schedules_tmp
-- transaction_open?()
   -> 0.0000s
-- execute("SET statement_timeout TO 0")
   -> 0.0008s
-- add_index("ci_pipeline_schedules", [:deleted_at, :id], {:name=>"index_on_ci_pipeline_schedules_tmp", :where=>"deleted_at IS NOT NULL", :algorithm=>:concurrently})
   -> 0.3260s
   -> 0.3271s
-- index_exists?("ci_triggers", {:name=>"index_on_ci_triggers_tmp"})
   -> 0.0033s
-- Creating temporary index index_on_ci_triggers_tmp
-- transaction_open?()
   -> 0.0000s
-- execute("SET statement_timeout TO 0")
   -> 0.0007s
-- add_index("ci_triggers", [:deleted_at, :id], {:name=>"index_on_ci_triggers_tmp", :where=>"deleted_at IS NOT NULL", :algorithm=>:concurrently})
   -> 0.5908s
   -> 0.5918s
-- Removed soft removed rows from issues
   -> 62.7203s
-- Removed soft removed rows from merge_requests
   -> 305.9385s
-- Removed soft removed rows from ci_pipeline_schedules
   -> 34.2348s
-- Removed soft removed rows from ci_triggers
   -> 260.4303s
-- index_exists?("issues", {:name=>"index_on_issues_tmp"})
   -> 0.0188s
-- index_exists?("merge_requests", {:name=>"index_on_merge_requests_tmp"})
   -> 0.0219s
-- index_exists?("ci_pipeline_schedules", {:name=>"index_on_ci_pipeline_schedules_tmp"})
   -> 0.0052s
-- index_exists?("ci_triggers", {:name=>"index_on_ci_triggers_tmp"})
   -> 0.0038s
==  RemoveSoftRemovedObjects: migrated (691.0414s) ============================

This did involve some retries of the migration, so the total migration time would have been longer. On production the various DELETE queries this migration runs complete much quicker compared to staging, so it's hard to say how long it will take on production.

Impact wise I'm not seeing any increase in replication lag or load when removing for example 10 000 soft removed issues. There's an increase in disk IO wait, but that's about it and somewhat to be expected when performing a lot of writes.

Does this MR meet the acceptance criteria?

What are the relevant issue numbers?

https://gitlab.com/gitlab-org/gitlab-ce/issues/37447

Edited by Yorick Peterse

Merge request reports