Skip to content

Geo: Treat missing files as sync failures

What does this MR do and why?

Describe in detail what your merge request does and why.

From #348295 (comment 775321315):

The missing-on-primary failures are revealed via a confusing loop: Sync "succeeds", verification fails, so sync state changes to failed, repeat.

I think the improvement should be: When a sync attempt discovers the file is missing, it should mark it as "failed". This may be as easy as removing the second condition from https://gitlab.com/gitlab-org/gitlab/-/blob/073b67bc920a7ae6ff3a3b0eab7f7c7c92b02e26/ee/app/services/geo/blob_download_service.rb#L34


This is a small but significant design change to files/blobs replicated by the Geo Self-Service Framework.

Before

When a file is missing on the primary, and a secondary attempts to sync it, the secondary considers it "synced", since its state matches the primary.

After

With this change, the secondary considers the sync "failed", since it was unable to sync the file and this is an undesirable state.

Other implications

For blob types which have Geo verification enabled, this change short-circuits a logical loop in which sync succeeds but verification fails and then sync becomes failed and then sync gets retried.

This loop affects blobs replicated and verified by the Geo Self-Service Framework, including:

  • Package Files
  • Terraform State Versions
  • Pipeline Artifacts

And soon to include:

  • LFS Objects
  • Pages Deployments
  • Uploads
  • CI Job Artifacts

This change is feature flagged behind geo_treat_missing_files_as_sync_failed so we can test in staging. Also I intend to enable it by default before removing it. Therefore customers will be able to easily switch back to old behavior for a whole milestone, in case of any unforeseen problems with this design change.

Part of #348745 (closed)

Screenshots or screen recordings

These are strongly recommended to assist reviewers and reduce the time to merge your change.

How to set up and validate locally

Numbered steps to set up and validate the change are strongly suggested.

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Michael Kozono

Merge request reports