Skip to content

Performance issues with many users in a group with many projects

On git.drupalcode.org, we want anyone to be able to pick up a merge request and work on it. Collaboration and multiple people contributing to resolve an issue is common. We tried to achieve this by adding all 65,369 users to a group that contained all 27,720 fork projects. https://www.drupal.org/project/drupalorg/issues/3313979 has some background.

As the bulk update added ~14,000 people to the group, it became clear that was not a workable solution. We were able to work through the Sidekiq queues, but new fork projects were not completing. We of course are tabling the idea and would like to reverse the bulk addition.

Unfortunately, as we removed ~3,000 people from the group, it became clear we were in a worse situation. TodosDestroyer::EntityLeaveWorker Sidekiq jobs look like they enqueue a TodosDestroyer::PrivateFeaturesWorker job for each project in the group. So it was attempting to queue over 83M jobs.

TodosDestroyer::EntityLeaveWorker jobs when executing in parallel would take tens of minutes and lead to Sidekiq restarting itself due to memory issues. I see Gitlab::SidekiqDaemon::MemoryKiller in the logs, exceeding the 2,000,000 limit. I suspect this was more of a symptom of Sidekiq being overloaded than Postgres queries.

TodosDestroyer::PrivateFeaturesWorker jobs complete effectively, there is just a lot of them.

Since EntityLeaveWorker was crashing, PrivateFeaturesWorker was too numerous, and these are all public projects that should not have To-Dos for newly added users, we decided to remove these from the queue with

irb(main):009:1* queue.each do |job|
irb(main):010:2*   if job.klass == 'TodosDestroyer::EntityLeaveWorker' && job.args[2] == 'Group'
irb(main):011:2*     puts job.args.join(',')
irb(main):012:2*     job.delete
irb(main):013:1*   end
irb(main):014:0> end

irb(main):015:1* queue.each do |job|
irb(main):016:2*   if job.klass == 'TodosDestroyer::PrivateFeaturesWorker'
irb(main):017:2*     puts job.args.join(',')
irb(main):018:2*     job.delete
irb(main):019:1*   end
irb(main):020:0> end

Aside from taking awhile to go through our huge queue, I think these have been effective at removing the immediate problems.

Once the Sidekiq queue is settled down, I fear we will still have various issues due to the remaining 11,000 being in the group. We can severely throttle the removals, which would likely take days.

Is there a safe way to bulk remove users from a group? For example, via SQL query, or without queuing these problematic jobs?

We are using GitLab 15.5.2-ee