Skip to content

GitLab Next

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
  • Help
    • See what's new at GitLab
    • Help
    • Support
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
GitLab FOSS
GitLab FOSS
  • Project overview
    • Project overview
    • Details
    • Activity
    • Releases
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
    • Locked Files
  • Issues 0
    • Issues 0
    • List
    • Boards
    • Labels
    • Service Desk
    • Milestones
  • Merge Requests 0
    • Merge Requests 0
  • Requirements
    • Requirements
    • List
  • Security & Compliance
    • Security & Compliance
    • Dependency List
    • License Compliance
  • Packages & Registries
    • Packages & Registries
    • Container Registry
  • Analytics
    • Analytics
    • Code Review
    • Insights
    • Issues
    • Repository
    • Value Stream
  • Snippets
    • Snippets
  • Members
    • Members
  • Collapse sidebar
  • Activity
  • Graph
  • Create a new issue
  • Commits
  • Issue Boards
  • GitLab.org
  • GitLab FOSSGitLab FOSS
  • Issues
  • #57317

"doc/doc/ruby_endpoint.md" did not exist on "master"
Closed
Open
Opened Feb 05, 2019 by Stan Hu@stanhu
  • Report abuse
  • New issue
Report abuse New issue

Bring back a subset of Rugged calls under a feature flag

Recently a customer noticed queued Unicorn workers and increased load after upgrading to GitLab 11.5.3 from 10.8.7:

image

More importantly, the total number of active and queued Unicorn workers also went up:

Before

image

image

After

image

image

We observed that many of the processes tended to be git cat-file processes waiting in the D state (uninterruptible disk sleep). This usually means there is an I/O wait on the NFS server. This explained why there was increased load (due to number of processes available to be run) but no corresponding increase in CPU load.

After applying the following merge requests to revert the following Gitaly RPCs back to the Rugged implementation, the system appeared to perform much better. These are 11.5 ports:

  • FindCommit: https://gitlab.com/gitlab-org/gitlab-ee/merge_requests/9377
  • GetTreeEntries: https://gitlab.com/gitlab-org/gitlab-ee/merge_requests/9403
  • TreeEntry: https://gitlab.com/gitlab-org/gitlab-ee/merge_requests/9404
  • CommitIsAncestor: https://gitlab.com/gitlab-org/gitlab-ee/merge_requests/9405
  • CommitTreeEntry: https://gitlab.com/gitlab-org/gitlab-ee/merge_requests/9989
  • FindDefaultBranchName: https://gitlab.com/gitlab-org/gitlab-ee/merge_requests/9529 Not needed

11.9 ports:

  • https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/25477
  • https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/25702
  • https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/25706
  • https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/25674
  • https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/25896
  • ListCommitsByOid: https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/27441

We suspect this may be due to a number of reasons:

  1. Increased I/O due to reloading refs and pack files. For example, loading the merge request widget (e.g. http://gitlab.example.com/TRYME/test-gitlab-bug2/merge_requests/2.json?serializer=widge) causes two FindCommit requests to be issued: one for the source branch, and one for the target branch. Previously we could reuse the same Rugged::Repository and avoid loading the repo pack file twice.
  2. N+1 queries introduced by the Gitaly implementation (e.g. https://gitlab.com/gitlab-org/gitlab-ce/issues/57107, https://gitlab.com/gitlab-org/gitlab-ce/issues/57114, https://gitlab.com/gitlab-org/gitlab-ce/issues/57113)
  3. NFS on spinning disk vs. SSDs. Spinning disk has much lower IOPS, which can slow random I/O accesses.
  4. git home directory mounted on an NFS directory. Any git process that runs will read the home .git/config and other files, which will slow things down.
  5. Users hitting the API hard and increased load from git upload-pack processes. Today we saw a node with 392 git upload-pack processes launched at the beginning of the hour.

Until we fully proven out Gitaly atop NFS, we should have a feature flag that allows use of the Rugged implementations of the aforementioned RPCs.

/cc: @jacobvosmaer-gitlab, @jwoods06, @lbot, @dblessing, @tcooney

Edited Apr 18, 2019 by Stan Hu

Linked issues

  • Discussion
  • Designs
Assignee
Assign to
11.9
Milestone
11.9
Assign milestone
Time tracking
None
Due date
None
3
Labels
backend devopscreate groupgitaly
Assign labels
  • View project labels
Reference: gitlab-org/gitlab-foss#57317