MergeRequestDiff#save_git_content can idle in transaction too long
We saw in #30528 (closed) that if a large repository receives lots of pushes in a short amount of time, we might have lots of jobs trying to update the
latest_merge_request_diff_id column and block.
MergeRequest#save_content, we see this:
# Collect information about commits and diff from repository # and save it to the database as serialized data def save_git_content MergeRequest .where('id = ? AND COALESCE(latest_merge_request_diff_id, 0) < ?', self.merge_request_id, self.id) .update_all(latest_merge_request_diff_id: self.id) ensure_commit_shas save_commits save_diffs # Another set of `after_save` hooks will be called here when we update the record save # We need to reset so that dirty tracking is reset when running the original set # of `after_save` hooks that come after this `after_create` hook. Otherwise, the # hooks that run when an attribute was changed are run twice. reset keep_around_commits end
Note for imports (https://gitlab.com/gitlab-org/gitlab/blob/c4ae73abdb145db1cafea8bc0300eca785642b92/lib/gitlab/import/merge_request_helpers.rb#L54-62), we workaround this issue by calling
# MR diffs normally use an "after_save" hook to pull data from Git. # All of this happens in the transaction started by calling # create/save/etc. This in turn can lead to these transactions being # held open for much longer than necessary. To work around this we # first save the diff, then populate it. diff = merge_request.merge_request_diffs.build diff.importing = true diff.save diff.save_git_content
I think we need to solve this once and for all.