Geo: Verification failure should cause resync, not retry of verification
Problem
The Geo Self-Service Framework verifies PackageFiles but currently it just retries verification if verification failed.
The behavior of project repos is basically: If verification failed for any reason, then update registry as if sync failed.
- https://gitlab.com/gitlab-org/gitlab/-/blob/v13.8.3-ee/ee/app/services/geo/repository_verification_secondary_service.rb#L51-89
- !6759 (merged)
This automatic healing behavior is very valuable, and already proven on project repos.
Proposal
Let's use similar behavior in the SSF.
To do
-
On verification_statetransition toverification_failed, transitionstatetofailed, settingretry_counttoverification_retry_count, to ensure progressive backoff of syncs-due-to-verification-failures -
Do #301247 (closed). On statetransition tosynced, transitionverification_statetoverification_pending-
If transitioning from verification_failedthen don't clearverification_retry_count, to ensure progressive backoff of syncs-due-to-verification-failures
-
Why worry about "syncs-due-to-verification-failures"?
E.g. the primary checksum is wrong, so the secondary has persistent sync "success" plus verification "failure". The naive behavior here would be a verification/sync loop with no progressive backoff. This problem exists for project repos: #208247
Edited by Michael Kozono