Geo: Improve handling of event log failures
From gitlab-com/migration#257 (closed) and gitlab-com/migration#259 (closed), we've seen Gitaly servers being down, which prevent events from being executed properly.
It looks like we have default of 3 retries on GeoRepositoryDestroyWorker. That means it's very possible for the event to fail and never be executed on the secondary.
One issue here is the event log will be pruned, so we can't go back in time to see what was missed.
Ideas:
- Increase the number of retries (e.g. 20?)
- Log an event/audit trail to replay failed events
Other thoughts?