Direct read queries of RepositoryUpdateMirrorWorker to a replica

This worker is scheduled from the ProjectImportScheduleWorker or from the web fleet when triggered manually. I think we could use data_consistency :sticky for this worker: it performs a lot of IO bound work talking to Gitaly, and only needs data from the database for validating permissions. The writes it does seem to do are related to updating the state of the import.

Regardless, this worker was responsible for 9.41% (at the time of writing) of the time spent talking to the primary database from Sidekiq the past 7 days. This isn't necessarily because of inefficient queries, it could be because of the amount of these jobs we process. Because of this, it would be beneficial to direct as much of the reads as possible to a read-replica.

When looking at the worker, it does seem like it performs writes pretty early when running: https://gitlab.com/gitlab-org/gitlab/blob/832d65d8e791d642cd8b141e76b71dfe88a70af4/ee/app/workers/repository_update_mirror_worker.rb#L22. Right after finding the project to work on, it marks the project as "in progress". But after, multiple relations are still loaded for performing permission checks, checking whether or not a branch is protected or if a tag can be created or removed by the mirroring user. We could move as many of these reads as possible to the start of thejob, before we mark the mirror as started. As far as I can tell, this entails the following relations:

This worker only performs 7 writes: marking the state being 2 of them, I suspect most of the reads come from an N+1 when iterating over the tags we're importing. So hopefully the preloading before the first write would also take care of that.

Query counts for this worker: https://log.gprd.gitlab.net/goto/fa79b840-d1f0-11ec-b73f-692cc1ae8214

Edited by Sean Carroll