Prevent failed file syncs from stalling Geo backfill
What does this MR do?
Prevents failed Geo file syncs from filling up the entire scheduler queue, blocking progress
Are there points in the code the reviewer needs to double check?
Are we OK storing a bare success?
I considered using bytes: -1
to indicate failure, as we do elsewhere, but rejected it as being too easy to mess up.
We're using null: false, default: true
on the column so existing entries are (correctly) considered to be successes. I tried with a default: false
(more traditional), followed by UPDATE file_registry SET success = 't'
but that would hang indefinitely for some reason.
Why was this MR needed?
If too many files consistently fail, they fill up the entire scheduling queue leading to no the backfill process stalling.
This is avoided in the project registry by using a failed
scope, so once we've tried to sync a project once, it is taken out of backfill whether it succeeds or not. Retries are accounted for using a secondary select.
Screenshots (if relevant)
Does this MR meet the acceptance criteria?
-
Changelog entry added, if necessary -
Tests added for this feature/bug - Review
-
Has been reviewed by Backend
-
-
Conform by the merge request performance guides -
Conform by the style guides -
Squashed related commits together
What are the relevant issue numbers?
Closes #3691 (closed)