Skip to content

Prevent failed file syncs from stalling Geo backfill

Nick Thomas requested to merge (removed):3691-geo-file-backfill into master

What does this MR do?

Prevents failed Geo file syncs from filling up the entire scheduler queue, blocking progress

Are there points in the code the reviewer needs to double check?

Are we OK storing a bare success?

I considered using bytes: -1 to indicate failure, as we do elsewhere, but rejected it as being too easy to mess up.

We're using null: false, default: true on the column so existing entries are (correctly) considered to be successes. I tried with a default: false (more traditional), followed by UPDATE file_registry SET success = 't' but that would hang indefinitely for some reason.

Why was this MR needed?

If too many files consistently fail, they fill up the entire scheduling queue leading to no the backfill process stalling.

This is avoided in the project registry by using a failed scope, so once we've tried to sync a project once, it is taken out of backfill whether it succeeds or not. Retries are accounted for using a secondary select.

Screenshots (if relevant)

Does this MR meet the acceptance criteria?

What are the relevant issue numbers?

Closes #3691 (closed)

Edited by Nick Thomas

Merge request reports