High rate of config.lock file errors on Geo testbed
I believe we added a new Sidekiq worker RepositoryRemoveRemoteWorker
. It looks like on the Geo testbed, we seem to have a high number of stale config.lock
files:
https://sentry.gitlap.com/gitlab/geo1/issues/118898/
First of all, why do we have this many removals on an instance that is basically idle? I think it's because we call Repository#fetch_remote
without a remote in https://gitlab.com/gitlab-org/gitlab-ee/blob/1efb8287d29b08086fe2719c6ef5b9b2e30dba8a/ee/app/services/geo/base_sync_service.rb#L85, which then calls this https://gitlab.com/gitlab-org/gitlab-ee/blob/1efb8287d29b08086fe2719c6ef5b9b2e30dba8a/app/models/repository.rb#L1000.
It seems to me for Geo, we shouldn't ever have to add/remove remotes as there should be a fixed remote for the primary and secondary.
I consider this high priority to fix because I can foresee lots of Geo replication failing as a result.
If we do get this error, I think it would be nice if we log these messages somewhere where it's easy to clean up stale files.
/cc: @tiagonbotelho, @DouweM