Skip to content

Housekeeping is not run on wiki repos, causing performance degradation over time

Summary

Housekeeping tasks are only run on the project repo, not wikis. As a result, an active wiki will slowly accumulate unpacked objects, degrading performance.

Steps to reproduce

  1. Create a project wiki
  2. Perform a number of updates
  3. Manually trigger housekeeping
  4. Check <REPO_PATH.wiki.git/objects, many unpacked objects will be present and no object packfile

What is the current bug behavior?

Wikis become slow due to unpacked git objects.

What is the expected correct behavior?

Wikis do not slow down with use.

Relevant logs and/or screenshots

{
  "correlation_id": "ucjwMk0Oyz6",
  "error": "rpc error: code = Canceled desc = rpc error: code = Canceled desc = context canceled",
  "grpc.code": "Canceled",
  "grpc.meta.auth_version": "v2",
  "grpc.meta.client_name": "gitlab-web",
  "grpc.meta.deadline_type": "regular",
  "grpc.method": "WikiGetPageVersions",
  "grpc.request.deadline": "2021-01-04T09:14:16Z",
  "grpc.request.fullMethod": "/gitaly.WikiService/WikiGetPageVersions",
  "grpc.request.glProjectPath": "group/project-docs.wiki",
  "grpc.request.glRepository": "wiki-122",
  "grpc.request.repoPath": "@hashed/1b/e0/1be00341082e25c4e251ca6713e767f7131a2823b0052caf9c9b006ec512f6cb.wiki.git",
  "grpc.request.repoStorage": "default",
  "grpc.request.topLevelGroup": "@hashed",
  "grpc.service": "gitaly.WikiService",
  "grpc.start_time": "2021-01-04T09:13:46Z",
  "grpc.time_ms": 29511.725,
  "level": "info",
  "msg": "finished streaming call with code Canceled",
  "peer.address": "@",
  "pid": 2529,
  "span.kind": "server",
  "system": "grpc",
  "time": "2021-01-04T09:14:16.000Z"
}

From an strace of Gitaly-Ruby, note that dur here is just for open calls, other syscalls take up the remainder of the time

     pid      dur (ms)      first time          last time          open ct    directory name
  -------    ----------    ---------------    ---------------    ----------    --------------
   124134      1950.930    12:38:20.418809    12:39:02.486836         47074    /var/opt/gitlab/git-data/repositories/@hashed/1b/e0/1be00341082e25c4e251ca6713e767f7131a2823b0052caf9c9b006ec512f6cb.wiki.git/objects

Output of checks

This bug happens on GitLab.com

Results of GitLab environment info

Expand for output related to GitLab environment info

System information
System:
Proxy:          no
Current User:   git
Using RVM:      no
Ruby Version:   2.7.2p137
Gem Version:    3.1.4
Bundler Version:2.1.4
Rake Version:   13.0.1
Redis Version:  5.0.9
Git Version:    2.29.0
Sidekiq Version:5.2.9
Go Version:     unknown

GitLab information
Version:        13.6.2-ee
Revision:       98aab73cbd5
Directory:      /opt/gitlab/embedded/service/gitlab-rails
DB Adapter:     PostgreSQL
DB Version:     11.9
Geo:            no
Using LDAP:     no
Using Omniauth: yes
Omniauth Providers:

GitLab Shell
Version:        13.13.0
Repository storage paths:
- default:      /var/opt/gitlab/git-data/repositories
GitLab Shell path:              /opt/gitlab/embedded/service/gitlab-shell
Git:            /opt/gitlab/embedded/bin/git

Results of GitLab application Check

Expand for output related to the GitLab application check

Checking GitLab subtasks ...

Checking GitLab Shell ...

GitLab Shell: ... GitLab Shell version >= 13.13.0 ? ... OK (13.13.0) Running /opt/gitlab/embedded/service/gitlab-shell/bin/check Internal API available: OK Redis available via internal API: OK gitlab-shell self-check successful

Checking GitLab Shell ... Finished

Checking Gitaly ...

Gitaly: ... default ... OK remote ... OK nfs ... OK

Checking Gitaly ... Finished

Checking Sidekiq ...

Sidekiq: ... Running? ... yes Number of Sidekiq processes ... 1

Checking Sidekiq ... Finished

Checking Incoming Email ...

Incoming Email: ... Reply by email is disabled in config/gitlab.yml

Checking Incoming Email ... Finished

Checking LDAP ...

LDAP: ... LDAP is disabled in config/gitlab.yml

Checking LDAP ... Finished

Checking GitLab App ...

Git configured correctly? ... yes Database config exists? ... yes All migrations up? ... yes Database contains orphaned GroupMembers? ... no GitLab config exists? ... yes GitLab config up to date? ... yes Log directory writable? ... yes Tmp directory writable? ... yes Uploads directory exists? ... yes Uploads directory has correct permissions? ... yes Uploads directory tmp has correct permissions? ... yes Init script exists? ... skipped (omnibus-gitlab has no init script) Init script up-to-date? ... skipped (omnibus-gitlab has no init script) Projects have namespace: ... 1/400 ... yes 1/401 ... yes 1/402 ... yes 1/403 ... yes 25/404 ... yes 1/407 ... yes 25/408 ... yes 1/411 ... yes 1/416 ... yes 1/417 ... yes 1/419 ... yes 1/421 ... yes 1/422 ... yes Redis version >= 4.0.0? ... yes Ruby version >= 2.5.3 ? ... yes (2.7.2) Git version >= 2.29.0 ? ... yes (2.29.0) Git user has default SSH configuration? ... yes Active users: ... 8 Is authorized keys file accessible? ... yes GitLab configured to store new projects in hashed storage? ... yes All projects are in hashed storage? ... yes Elasticsearch version 7.x (6.4 - 6.x deprecated to be removed in 13.8)? ... skipped (elasticsearch is disabled)

Checking GitLab App ... Finished

Checking GitLab subtasks ... Finished

Possible fixes

Housekeeping currently runs only on the project repo. Adding the wiki repo here would be one way to resolve the issue.

However, if a project is used just for its wiki and the project repo is inactive, housekeeping would never be triggered as the repack threshold on the project repo will never be hit.

Edited by Will Chandler (ex-GitLab)