Schedule pull mirrors per-shard

Problem to solve

Pull mirroring is an expensive operation for gitlab: gitlab-com/gl-infra/scalability#78 (comment 262385767) . Currently, we have limits that apply globally, but the cost is actually better understood as being paid per shard. Scheduling 16 pull mirrors on a single shard is very different to scheduling 16 pull mirrors across 16 shards.

We have to design each shard for peak, rather than average, load, and random variations in pull mirror scheduling can cause the peaks on a shard being very different between runs.

Intended users

Sidney (Systems Administrator)

Further details

Conceptually, it's quite similar to the work that a Geo secondary has to do, except we lack information about when an upstream has become outdated, so we must poll. Geo used to have a single scheduling pool and a single application limit as well. They were made per-shard as part of the "make Geo work for GitLab.com" effort, and I think it's worth doing the same here too.

Proposal

Schedule one UpdateAllMirrorsWorker per shard, and have it schedule pull mirrors for that shard only. Convert all existing limits to be per-shard limits.

Total concurrency will be the same, but it means that we'll schedule more evenly in the contended case. This will smooth out the peaks so pull mirroring will be a more consistent load on each shard, resulting in lower peaks.

If some shards exhibit much longer waits between pull mirror operations than others, this leads us in the direction of rebalancing shards to remove hotspots.

Edited Dec 18, 2019 by Nick Thomas