Sticky failures of "Missing on Primary" attachments

While working on https://gitlab.com/gitlab-com/migration/issues/365 I found few failed syncs that were fixed when I ran Geo::FileDownloadService manually for them. What's interesting is that the retry_count was set to a pretty big number (more than 10). And retry date was too far away, like days.

Imagine the case when we have "missed on primary" sync which is marked as successfully synced. We never reset retry_count because we try to sync those "missed on primary" forever. That leads to a fact that all the MoP failures have a pretty high retry_count. If at some point we have Networks issue or some other issue for a short period of time and MoP failures were run at this particular time then they are marked as success: false but they are scheduled on some day that is few days/weeks away. That makes MoP failures "sticky" so they spoil overall sync statistic but Geo will nothing do about it for days/weeks. I don't think it's optimal. WDYT?

/cc @mkozono

Edited May 18, 2018 by Valery Sizov
Assignee Loading
Time tracking Loading