Skip to content

Disable deletions by default in background verifier

Sami Hiltunen requested to merge smh-disable-verifier-deletion into master

Praefect's background verifier goes through the replica metadata records and verifies the replicas still exist. If not, it deletes their metadata records so no traffic is routed to them and they'll be reconciled.

Due to renames not being atomic in Praefect, there is a race that can cause the verifier to remove metadata records that belong to renamed repositories. The rename process first renames the repository on the disk and afterwards updates the metadata record to match. If the verifier checks the replica after it's renamed on the disk but before it's metadata record is renamed, the verifier would errorenously remove the record. This can be a problem with Geo in particular as it renames a lot of repositories, and soft deletions.

To prevent these errorenous removals but to still get data from production by running the verifier, this commit disables the removal behavior. The metrics and logs produced still reflect the invalidity of the metadata record but the record itself is not deleted. The replica is instead marked as successfully verified so the verifier can proceed with checking other replicas. The invalid replicas will be found again after the verification interval has passed.

There is a fix pending for making the renames atomic which will do away with this problem. Until that lands, the deletion logic should not be default enabled.


As the deletions are disabled by default, we can now enable the background worker by default. This MR sets the verification interval to seven days. This allows for Praefect to start logging and producing metrics on the invalid metadata ahead of us enabling the deletions.


Pending atomicity fixes: !4101 (merged)
Documentation changes: gitlab!86652 (merged)
Omnibus changes: omnibus-gitlab!6081 (merged)
Closes: #4211 (closed)

Edited by Sami Hiltunen

Merge request reports