Geo: When a blob sync fails, it should not result in a Sidekiq job failure

Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.

Summary

When an error occurs during sync of a blob, the error is recorded as a sync failure in the registry row, and it is tracked in Sentry, and logged. Errors during sync are common and expected. The majority of sync errors are transient, or are only encountered during first-time setup.

The error is then allowed to bubble up to the Sidekiq job, Geo::SyncWorker or Geo::EventWorker. But these uncaught errors:

  1. Look the same as catastrophic errors in Sidekiq metrics (ask a Dedicated SRE). Silencing them obscures other problems.
  2. Can cause other problems #524761

Steps to reproduce

What is the current bug behavior?

Geo::SyncWorker fails if BlobDownloadService records a sync failure.

What is the expected correct behavior?

Geo::SyncWorker finishes if BlobDownloadService records a sync failure.

Implementation Guide

  1. If BlobDownloadService records a sync failure, it should swallow the error rather than re-raise.
  2. Add a test or two that demonstrate this new behavior (or adjust existing tests to do so)
Edited by 🤖 GitLab Bot 🤖