Skip to content
GitLab
Next
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
  • GitLab FOSS GitLab FOSS
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
    • Locked Files
  • Issues 0
    • Issues 0
    • List
    • Boards
    • Service Desk
    • Milestones
    • Iterations
    • Requirements
  • Merge requests 1
    • Merge requests 1
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Container Registry
    • Infrastructure Registry
  • Monitor
    • Monitor
    • Metrics
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • Code review
    • Insights
    • Issue
    • Repository
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Commits
  • Issue Boards
Collapse sidebar
  • GitLab.orgGitLab.org
  • GitLab FOSSGitLab FOSS
  • Issues
  • #57317
Closed
Open
Issue created Feb 05, 2019 by Stan Hu@stanhuOwner

Bring back a subset of Rugged calls under a feature flag

Recently a customer noticed queued Unicorn workers and increased load after upgrading to GitLab 11.5.3 from 10.8.7:

image

More importantly, the total number of active and queued Unicorn workers also went up:

Before

image

image

After

image

image

We observed that many of the processes tended to be git cat-file processes waiting in the D state (uninterruptible disk sleep). This usually means there is an I/O wait on the NFS server. This explained why there was increased load (due to number of processes available to be run) but no corresponding increase in CPU load.

After applying the following merge requests to revert the following Gitaly RPCs back to the Rugged implementation, the system appeared to perform much better. These are 11.5 ports:

  • FindCommit: https://gitlab.com/gitlab-org/gitlab-ee/merge_requests/9377
  • GetTreeEntries: https://gitlab.com/gitlab-org/gitlab-ee/merge_requests/9403
  • TreeEntry: https://gitlab.com/gitlab-org/gitlab-ee/merge_requests/9404
  • CommitIsAncestor: https://gitlab.com/gitlab-org/gitlab-ee/merge_requests/9405
  • CommitTreeEntry: https://gitlab.com/gitlab-org/gitlab-ee/merge_requests/9989
  • FindDefaultBranchName: https://gitlab.com/gitlab-org/gitlab-ee/merge_requests/9529 Not needed

11.9 ports:

  • https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/25477
  • https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/25702
  • https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/25706
  • https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/25674
  • https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/25896
  • ListCommitsByOid: https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/27441

We suspect this may be due to a number of reasons:

  1. Increased I/O due to reloading refs and pack files. For example, loading the merge request widget (e.g. http://gitlab.example.com/TRYME/test-gitlab-bug2/merge_requests/2.json?serializer=widge) causes two FindCommit requests to be issued: one for the source branch, and one for the target branch. Previously we could reuse the same Rugged::Repository and avoid loading the repo pack file twice.
  2. N+1 queries introduced by the Gitaly implementation (e.g. https://gitlab.com/gitlab-org/gitlab-ce/issues/57107, https://gitlab.com/gitlab-org/gitlab-ce/issues/57114, https://gitlab.com/gitlab-org/gitlab-ce/issues/57113)
  3. NFS on spinning disk vs. SSDs. Spinning disk has much lower IOPS, which can slow random I/O accesses.
  4. git home directory mounted on an NFS directory. Any git process that runs will read the home .git/config and other files, which will slow things down.
  5. Users hitting the API hard and increased load from git upload-pack processes. Today we saw a node with 392 git upload-pack processes launched at the beginning of the hour.

Until we fully proven out Gitaly atop NFS, we should have a feature flag that allows use of the Rugged implementations of the aforementioned RPCs.

/cc: @jacobvosmaer-gitlab, @jwoods06, @lbot, @dblessing, @tcooney

Edited Apr 18, 2019 by Stan Hu
Assignee
Assign to
Time tracking