Creating project times out when group has many members
Summary
Creating projects in a group with many members times out.
Steps to reproduce
- Have a group with many members. We see this problem frequently in https://gitlab.com/gitlab-community which has (at the time of issue creation) 3975 members.
- Create a project
Example Group
https://gitlab.com/gitlab-community
What is the current bug behavior?
The request takes very long or times out. In case of a timeout, actions like repository creation or repository imports are not triggered.
What is the expected correct behavior?
The request completes normally.
Relevant logs and/or screenshots
Example correlation ID: 01JNS3SZJY4MBAAQYMPWFVXV2B
If someone wants the json file from the performance bar, let me know. It is about 70MB, which is too big to be attached here or pasted into a private snippet. Provided in internal note #523919 (comment 2395846189)
Output of checks
This bug happens on GitLab.com
Results of GitLab environment info
Expand for output related to GitLab environment info
(For installations with omnibus-gitlab package run and paste the output of: `sudo gitlab-rake gitlab:env:info`) (For installations from source run and paste the output of: `sudo -u git -H bundle exec rake gitlab:env:info RAILS_ENV=production`)
Results of GitLab application Check
Expand for output related to the GitLab application check
(For installations with omnibus-gitlab package run and paste the output of:
sudo gitlab-rake gitlab:check SANITIZE=true)(For installations from source run and paste the output of:
sudo -u git -H bundle exec rake gitlab:check RAILS_ENV=production SANITIZE=true)(we will only investigate if the tests are passing)
Possible fixes
From looking at the output of the performance bar, it seems like the deduplication of the AuthorizedProjectUpdate::UserRefreshFromReplicaWorker is the problem. The deduplication checks the current WAL location for every single job that gets enqueued.
The immediate problem could be fixed by moving the queuing of that worker into a worker to remove it from the request time. Given that the worker is enqueued with a bigger delay, I don't see a problem if it happens a short time later if it gets scheduled through another worker.
