Skip to content

Geo: Increase resync backoff cap for legacy blobs missing on primary

Michael Kozono requested to merge mk/increase-backoff-cap-for-legacy-blobs into master

What does this MR do?

Increase backoff cap for retrying failed download of legacy blobs (Job artifacts, LFS objects, and Uploads) which are missing on the primary site as well.

On staging.gitlab.com, many files are (intentionally) missing on the primary, so geo.staging.gitlab.com attempts to sync them every hour. We don't want to disable retries after some maximum number, because we want the system to automatically recover if the files ever appear. But every hour is a bit excessive, given all retries have failed up to that point. So this commit raises the retry time cap for legacy blobs missing on primary from 1 hour to 4 hours. Before the cap is reached, a progressive backoff scheme is used.

As an aside, resources which are replicated by the Geo framework will soon gain the automatic verification and re-verification feature. This will eventually resync resources which were missing on the primary and then became not missing.

#294485 (closed)

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Edited by Michael Kozono

Merge request reports