Fix merge pre-receive errors when load balancing in use
When a user merges a merge request, the following sequence happens:
- Sidekiq:
MergeService
locks the state of the merge request in the DB. - Gitaly: UserMergeBranch RPC runs and creates a merge commit.
- Sidekiq:
Repository#merge
andRepository#ff_merge
updates thein_progress_merge_commit_sha
database column with this merge commit SHA. - Gitaly: UserMergeBranch RPC runs again and applies this merge commit to the target branch.
- Gitaly (gitaly-ruby): This RPC calls the pre-receive hook.
- Rails: This hook makes an API request to
/api/v4/internal/allowed
. - Rails: This API check makes a SQL query for locked merge requests with a matching SHA.
Since steps 1 and 7 will happen in different database sessions, replication lag could erroneously cause step 7 to report no matching merge requests. To avoid this, we have a few options:
- Wrap step 7 in a transaction. The EE load balancer will always direct these queries to the primary.
- Always force the load balancer session to use the primary for this query.
- Use the load balancing sticking mechanism to use the primary until the secondaries have caught up to the right write location.
Option 1 isn't great because on GitLab.com, this query can take 500 ms to run, and long-running, read transactions are not desirable.
Option 2 is simpler and guarantees that we will always have a consistent read. However, none of these queries will ever be routed to a secondary, and this may cause undo load on the primary.
We go with option 3. Whenever the in_progress_merge_commit_sha
is
updated, we mark the write location of the primary. Then in
MatchingMergeRequest#match?
, we stick to the primary if the replica
has not caught up to that location.
Relates to #247857 (closed)