Skip to content

MergeRequestDiff#save_git_content can idle in transaction too long

We saw in #30528 (closed) that if a large repository receives lots of pushes in a short amount of time, we might have lots of jobs trying to update the latest_merge_request_diff_id column and block.

In MergeRequest#save_content, we see this:

  # Collect information about commits and diff from repository
  # and save it to the database as serialized data
  def save_git_content
    MergeRequest
      .where('id = ? AND COALESCE(latest_merge_request_diff_id, 0) < ?', self.merge_request_id, self.id)
      .update_all(latest_merge_request_diff_id: self.id)

    ensure_commit_shas
    save_commits
    save_diffs

    # Another set of `after_save` hooks will be called here when we update the record
    save
    # We need to reset so that dirty tracking is reset when running the original set
    # of `after_save` hooks that come after this `after_create` hook. Otherwise, the
    # hooks that run when an attribute was changed are run twice.
    reset

    keep_around_commits
  end

Note for imports (https://gitlab.com/gitlab-org/gitlab/blob/c4ae73abdb145db1cafea8bc0300eca785642b92/lib/gitlab/import/merge_request_helpers.rb#L54-62), we workaround this issue by calling save before save_git_content:

        # MR diffs normally use an "after_save" hook to pull data from Git.
        # All of this happens in the transaction started by calling
        # create/save/etc. This in turn can lead to these transactions being
        # held open for much longer than necessary. To work around this we
        # first save the diff, then populate it.
        diff = merge_request.merge_request_diffs.build
        diff.importing = true
        diff.save
        diff.save_git_content

I think we need to solve this once and for all.