Skip to content
GitLab
Next
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
  • GitLab GitLab
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
    • Locked Files
  • Issues 44,761
    • Issues 44,761
    • List
    • Boards
    • Service Desk
    • Milestones
    • Iterations
    • Requirements
  • Merge requests 1,331
    • Merge requests 1,331
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
    • Test Cases
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Container Registry
    • Infrastructure Registry
  • Monitor
    • Monitor
    • Metrics
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Code review
    • Insights
    • Issue
    • Repository
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • GitLab.orgGitLab.org
  • GitLabGitLab
  • Issues
  • #33650
Closed
Open
Issue created Oct 08, 2019 by Stan Hu@stanhuOwner

MergeRequestDiff#save_git_content can idle in transaction too long

We saw in #30528 (closed) that if a large repository receives lots of pushes in a short amount of time, we might have lots of jobs trying to update the latest_merge_request_diff_id column and block.

In MergeRequest#save_content, we see this:

  # Collect information about commits and diff from repository
  # and save it to the database as serialized data
  def save_git_content
    MergeRequest
      .where('id = ? AND COALESCE(latest_merge_request_diff_id, 0) < ?', self.merge_request_id, self.id)
      .update_all(latest_merge_request_diff_id: self.id)

    ensure_commit_shas
    save_commits
    save_diffs

    # Another set of `after_save` hooks will be called here when we update the record
    save
    # We need to reset so that dirty tracking is reset when running the original set
    # of `after_save` hooks that come after this `after_create` hook. Otherwise, the
    # hooks that run when an attribute was changed are run twice.
    reset

    keep_around_commits
  end

Note for imports (https://gitlab.com/gitlab-org/gitlab/blob/c4ae73abdb145db1cafea8bc0300eca785642b92/lib/gitlab/import/merge_request_helpers.rb#L54-62), we workaround this issue by calling save before save_git_content:

        # MR diffs normally use an "after_save" hook to pull data from Git.
        # All of this happens in the transaction started by calling
        # create/save/etc. This in turn can lead to these transactions being
        # held open for much longer than necessary. To work around this we
        # first save the diff, then populate it.
        diff = merge_request.merge_request_diffs.build
        diff.importing = true
        diff.save
        diff.save_git_content

I think we need to solve this once and for all.

Assignee
Assign to
Time tracking