MergeRequestDiff#save_git_content can idle in transaction too long
We saw in #30528 (closed) that if a large repository receives lots of pushes in a short amount of time, we might have lots of jobs trying to update the latest_merge_request_diff_id
column and block.
In MergeRequest#save_content
, we see this:
# Collect information about commits and diff from repository
# and save it to the database as serialized data
def save_git_content
MergeRequest
.where('id = ? AND COALESCE(latest_merge_request_diff_id, 0) < ?', self.merge_request_id, self.id)
.update_all(latest_merge_request_diff_id: self.id)
ensure_commit_shas
save_commits
save_diffs
# Another set of `after_save` hooks will be called here when we update the record
save
# We need to reset so that dirty tracking is reset when running the original set
# of `after_save` hooks that come after this `after_create` hook. Otherwise, the
# hooks that run when an attribute was changed are run twice.
reset
keep_around_commits
end
Note for imports (https://gitlab.com/gitlab-org/gitlab/blob/c4ae73abdb145db1cafea8bc0300eca785642b92/lib/gitlab/import/merge_request_helpers.rb#L54-62), we workaround this issue by calling save
before save_git_content
:
# MR diffs normally use an "after_save" hook to pull data from Git.
# All of this happens in the transaction started by calling
# create/save/etc. This in turn can lead to these transactions being
# held open for much longer than necessary. To work around this we
# first save the diff, then populate it.
diff = merge_request.merge_request_diffs.build
diff.importing = true
diff.save
diff.save_git_content
I think we need to solve this once and for all.