Geo: Verification failure should cause resync, not retry of verification
Problem
The Geo Self-Service Framework verifies PackageFiles but currently it just retries verification if verification failed.
The behavior of project repos is basically: If verification failed for any reason, then update registry as if sync failed.
- https://gitlab.com/gitlab-org/gitlab/-/blob/v13.8.3-ee/ee/app/services/geo/repository_verification_secondary_service.rb#L51-89
- !6759 (merged)
This automatic healing behavior is very valuable, and already proven on project repos.
Proposal
Let's use similar behavior in the SSF.
To do
-
On verification_state
transition toverification_failed
, transitionstate
tofailed
, settingretry_count
toverification_retry_count
, to ensure progressive backoff of syncs-due-to-verification-failures -
Do #301247 (closed). On state
transition tosynced
, transitionverification_state
toverification_pending
-
If transitioning from verification_failed
then don't clearverification_retry_count
, to ensure progressive backoff of syncs-due-to-verification-failures
-
Why worry about "syncs-due-to-verification-failures"?
E.g. the primary checksum is wrong, so the secondary has persistent sync "success" plus verification "failure". The naive behavior here would be a verification/sync loop with no progressive backoff. This problem exists for project repos: #208247
Edited by Michael Kozono