Geo: Treat missing blobs as sync failures
From #348295 (comment 775321315):
The missing-on-primary failures are revealed via a confusing loop: Sync "succeeds", verification fails, so sync state changes to failed, repeat.
I think the improvement should be: When a sync attempt discovers the file is missing, it should mark it as "failed". This may be as easy as removing the second condition from https://gitlab.com/gitlab-org/gitlab/-/blob/073b67bc920a7ae6ff3a3b0eab7f7c7c92b02e26/ee/app/services/geo/blob_download_service.rb#L34
Problem
When a blob is missing on the primary, and a secondary attempts to sync it, the secondary considers it "synced", since its state matches the primary. For blob types which have Geo verification enabled, this causes an inefficient logical loop in which:
- sync succeeds
- verification fails
- sync becomes failed
- sync gets retried
- repeat
This loop affects blobs replicated and verified by the Geo Self-Service Framework, including:
- Package Files
- Terraform State Versions
- Pipeline Artifacts
- LFS Objects (since 14.6)
- Pages Deployments (since 14.6)
- Uploads (since 14.6)
And soon to include:
- CI Job Artifacts
Additionally, the current behavior "hides" the problem more than we would like. Geo has raised the bar for data integrity in GitLab since "synced but missing on primary" behavior was first implemented.
Proposal
-
When a blob is missing on the primary, and a secondary attempts to sync it, the secondary should consider the sync "failed", since it was unable to sync the file and this is an undesirable state. => !76801 (merged) (and the rollout issue tracks enabling this #348590 (closed))
This is a small but significant design change to blob replication by the Geo Self-Service Framework.
To do
-
Improve the failure reason => !77203 (merged) -
Increase exponential backoff cap of failure retries from 1 hour to 4 hours when SSF records a sync failure due to missing on primary => !77208 (merged) -
We should also document this sync-verification loop in Geo Troubleshooting. => !77264 (merged)