Skip to content

Fix merge pre-receive errors when load balancing in use

Stan Hu requested to merge sh-stick-primary-matching-mrs into master

When a user merges a merge request, the following sequence happens:

  1. Sidekiq: MergeService locks the state of the merge request in the DB.
  2. Gitaly: UserMergeBranch RPC runs and creates a merge commit.
  3. Sidekiq: Repository#merge and Repository#ff_merge updates the in_progress_merge_commit_sha database column with this merge commit SHA.
  4. Gitaly: UserMergeBranch RPC runs again and applies this merge commit to the target branch.
  5. Gitaly (gitaly-ruby): This RPC calls the pre-receive hook.
  6. Rails: This hook makes an API request to /api/v4/internal/allowed.
  7. Rails: This API check makes a SQL query for locked merge requests with a matching SHA.

Since steps 1 and 7 will happen in different database sessions, replication lag could erroneously cause step 7 to report no matching merge requests. To avoid this, we have a few options:

  1. Wrap step 7 in a transaction. The EE load balancer will always direct these queries to the primary.
  2. Always force the load balancer session to use the primary for this query.
  3. Use the load balancing sticking mechanism to use the primary until the secondaries have caught up to the right write location.

Option 1 isn't great because on GitLab.com, this query can take 500 ms to run, and long-running, read transactions are not desirable.

Option 2 is simpler and guarantees that we will always have a consistent read. However, none of these queries will ever be routed to a secondary, and this may cause undo load on the primary.

We go with option 3. Whenever the in_progress_merge_commit_sha is updated, we mark the write location of the primary. Then in MatchingMergeRequest#match?, we stick to the primary if the replica has not caught up to that location.

Relates to #247857 (closed)

Edited by Stan Hu

Merge request reports