Geo: Improve Repository Sync
This is a follow up on #1463 (closed) issue. We've implemented a partial fix for Geo (#76 (closed)), that will prevent or reduce the chance of concurrent updates filling up the repository in the secondary node with useless data.
When a git push starts the client sends a bunch of packed objects to the other side and at the end it "commit" the changes to the repository. When multiple clients are trying to update the same information at the same time, at the end only one will finish, but you will end-up with multiple waste from the other tries.
This is a situation where you have to execute
git gc to cleanup the repository.
We currently experienced this type of issue, because we were using
tag_push events from
SystemHook. There is a proposal to add a different event: gitlab-ce#26325 (closed) that will notify in a single request all the relevant data.
This is part of the solution. By not sending one notification for each changed branch in a push, we reduce the chance of concurrent updates, but if multiple different pushes happen very close to each other, there is still a chance that a few will execute concurrently with the same data (the current state of the repository).
There are two things we can do here, after start using the new SystemHook:
- We need to start increasing the repository GC counter, in the secondary node, so eventually a GC will run and will cleanup the repository
- We can also introduce an execution lock (so only one repository update for the same repository will happen at the same time).
For the execution lock, my initial idea was to use https://github.com/mhenrixon/sidekiq-unique-jobs, but as @yorickpeterse said, we can't use any gem that hijacks Sidekiq's fetching API as we already use https://github.com/brainopia/sidekiq-limit_fetch. (See discussion: !1015 (comment 20888051))
As we can't use that Gem, and we will already reduce the impact on the notification generation side, we can use a simpler alternative:
Gitlab::ExclusiveLease. This is already being used by the GC Job.
The new SystemHook will either require the current information per-branch that we use to decide upon the cache invalidation strategy, or we will have to figure out a different cache invalidation heuristic. (For the patch in gitlab-ce#26325 (closed) we are always invalidating everything, which is not ideal).