Skip to content

Geo: Verification failure should cause resync, not retry of verification

Problem

The Geo Self-Service Framework verifies PackageFiles but currently it just retries verification if verification failed.

The behavior of project repos is basically: If verification failed for any reason, then update registry as if sync failed.

This automatic healing behavior is very valuable, and already proven on project repos.

Proposal

Let's use similar behavior in the SSF.

To do

  • On verification_state transition to verification_failed, transition state to failed, setting retry_count to verification_retry_count, to ensure progressive backoff of syncs-due-to-verification-failures
  • Do #301247 (closed). On state transition to synced, transition verification_state to verification_pending
    • If transitioning from verification_failed then don't clear verification_retry_count, to ensure progressive backoff of syncs-due-to-verification-failures

Why worry about "syncs-due-to-verification-failures"?

E.g. the primary checksum is wrong, so the secondary has persistent sync "success" plus verification "failure". The naive behavior here would be a verification/sync loop with no progressive backoff. This problem exists for project repos: #208247

Edited by Michael Kozono