GetAllLFSPointers is called too frequently for pull mirror operations
Only update LFS objects for pullmirrors when branches were updated. This should relieve the strain on the Gitaly nodes.
We could do this by updating the Projects::UpdateMirrorService#update_branches method to return which branches were updated. If there weren't any, we can skip Projects::UpdateMirrorService#update_lfs_objects.
Original problem
As discussed in #29233 (comment 254498412) and gitaly#1885 (closed), the Gitaly GetAllLFSPointers endpoint is very slow and consumes large amounts of IO as it needs to touch every object in a Git repository.
In gitlab-com/gl-infra/scalability#64, we're investigating slowdowns on GitLab.com and evidence is mounting that GetAllLFSPointers is at least partially to blame.
The following pattern seems to reoccur frequently:
- A large repository, often a popular open source project, is mirrored from another location on the internet. Quite often this initial operation appears to be the only interaction the user has with the repository -- after being setup it is frequently abandoned.
- Several times an hour from then onwards, the repository is pull mirrored from its upstream.
- Along with other activity, a
GetAllLFSPointerscall is made that iterates over every object in the repository. - For huge repositories, this is a very expensive operation. In fact, it frequently times out on GitLab.com.
- Add this up for 80k mirrored repositories, gives you 160k very expensive
GetAllLFSPointersper hour.
This urgently needs to be addressed.
Several possibly solutions:
- Slow down the rate of mirroring on inactive projects. We perform the
GetAllLFSPointerswhether or not the repository has changed. - Skip the
GetAllLFSPointersif the repository is up to date with the upstream. - Pause mirroring on repositories that are not accessed by a user within 6 months? a year?
- Perform a selective
GetSomeLFSPointersor other application optimisations.
As mentioned, there is evidence that this operation is consuming vast amounts of IO, slowing down other operations on GitLab.com (gitlab-com/gl-infra/scalability#64). For this reason I'm prioritising it as ~P2 ~S2 and may escalate it further if more evidence is found that this is leading to performance issues.