Geo: Increase resync backoff cap for legacy blobs missing on primary (!50812) · Merge requests · GitLab.org / GitLab

Michael Kozono requested to merge mk/increase-backoff-cap-for-legacy-blobs into master Jan 05, 2021

What does this MR do?

Increase backoff cap for retrying failed download of legacy blobs (Job artifacts, LFS objects, and Uploads) which are missing on the primary site as well.

On staging.gitlab.com, many files are (intentionally) missing on the primary, so geo.staging.gitlab.com attempts to sync them every hour. We don't want to disable retries after some maximum number, because we want the system to automatically recover if the files ever appear. But every hour is a bit excessive, given all retries have failed up to that point. So this commit raises the retry time cap for legacy blobs missing on primary from 1 hour to 4 hours. Before the cap is reached, a progressive backoff scheme is used.

As an aside, resources which are replicated by the Geo framework will soon gain the automatic verification and re-verification feature. This will eventually resync resources which were missing on the primary and then became not missing.

#294485 (closed)

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process.

Edited Jan 06, 2021 by Michael Kozono

Geo: Increase resync backoff cap for legacy blobs missing on primary

What does this MR do?

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Merge request reports