Direct read queries of RepositoryUpdateMirrorWorker to a replica
This worker is scheduled from the ProjectImportScheduleWorker
or from the web fleet when triggered manually. I think we could use data_consistency :sticky
for this worker: it performs a lot of IO bound work talking to Gitaly, and only needs data from the database for validating permissions. The writes it does seem to do are related to updating the state of the import.
Regardless, this worker was responsible for 9.41% (at the time of writing) of the time spent talking to the primary database from Sidekiq the past 7 days. This isn't necessarily because of inefficient queries, it could be because of the amount of these jobs we process. Because of this, it would be beneficial to direct as much of the reads as possible to a read-replica.
When looking at the worker, it does seem like it performs writes pretty early when running: https://gitlab.com/gitlab-org/gitlab/blob/832d65d8e791d642cd8b141e76b71dfe88a70af4/ee/app/workers/repository_update_mirror_worker.rb#L22. Right after finding the project to work on, it marks the project as "in progress". But after, multiple relations are still loaded for performing permission checks, checking whether or not a branch is protected or if a tag can be created or removed by the mirroring user. We could move as many of these reads as possible to the start of thejob, before we mark the mirror as started. As far as I can tell, this entails the following relations:
project.mirror_user
project.creator
project.import_state
-
project.team.max_member_access(user)
: This is used for checking if the user can actually perform the mirror. Perhaps we can check the permissions earlier so they're memoized for the rest of the run by checkingcan?(current_user, :push_code_to_protected_branches, project)
-
project.protected_tags
and the relatedProtectedTag#create_access_levels
: FromUpdateMirrorService#can_crate_tag?
-
project.protected_branches
: to check if we should pull the branch or not if we're only mirroring protected branches. I think we only need to load these ifproject.only_mirror_protected_branches
is enabled.
This worker only performs 7 writes: marking the state being 2 of them, I suspect most of the reads come from an N+1 when iterating over the tags we're importing. So hopefully the preloading before the first write would also take care of that.
Query counts for this worker: https://log.gprd.gitlab.net/goto/fa79b840-d1f0-11ec-b73f-692cc1ae8214