GPG signature feature causing out of control Sidekiq queues
The new GPG signature feature caused a huge backlog of our Sidekiq queues:
The create_gpg_signature
queue is handled by the Sidekiq best-effort nodes, which do not have the capacity to handle this queue.
A larger point is that this feature has a number of production-level issues:
- Every commit in a push is scheduled with a new
CreateGpgSignatureWorker
worker - Each
CreateGpgSignatureWorker
worker requires loading the repository from Rugged and looking up the commit - Each commit attempts to extract the signature from the commit
- If the signature is present, do the GPG verification work
The problem is that it's quite expensive to do steps 2 and 3 when most of the time there are no signatures. For example, verifying 100 commits for a single push with a 2 GB packfile would require reading this pack file 100 times.
For processing references in commit messages, we have a number of optimizations that reduce the amount of work:
- We limit the number of commits per push to the last 100
- We scan the commit message to see if it contains a possible reference regex
I hope we can do something similar with GPG signatures.
For now, I propose:
- Hotpatch GitLab.com to remove the
update_signatures
call ingit_push_service.rb
- Drop the
create_gpg_signature
Sidekiq queue - Put a feature flag around this for RC3, disabled by default
- Make this feature work in a performant manner
/cc: @dzaporozhets, @DouweM, @pcarranza, @sitschner, @koffeinfrei