In &469, we're moving from a one-worker-one-queue model to many-workers-one-queue. Some workers can't be easily migrated. That means we need to either exclude them from this work or fix them before migrating them to a shared queue.
Check their own queue size
These workers depend on checking their own queue size, which won't work in this model.
Keep it as it is. We would need to make some noise around this when we make &194 or &447 available to self-managed installations who could also be using this.
Jobs written to Redis without passing through the application
EmailReceiverWorker
ServiceDeskEmailReceiverWorker
The EmailReceiverWorker and ServiceDeskEmailReceiverWorker rely on our use of mail_room to take emails from an IMAP mailbox and push them into Sidekiq. mail_room does not use our application code directly, and it does not run our Sidekiq client middleware.
We already have these queue names in our application (to pass to mail_room) in Gitlab::MailRoom::ADDRESS_SPECIFIC_CONFIG, so we could use the worker routing there, but that is probably a task of its own rather than one we want to bundle in with other queues: #1263 (closed)
@smcgivern Initially we should exclude them from the &447 epic (which you have already done), but longer-term we need to have them fixed so that there is only one method of using Sidekiq. I'm concerned that if we have some workers in their own queues that new workers might be added with this model.
@reprazent (or @smcgivern) Why are these workers checking the queue length? What do they do with the length information? I'm not looking for a detailed description of all of them, just an idea of why one might need this.
@rnienaber typically they want to do some kind of rate control or throttling, so that we don't have too many expensive workers running at the same time.
For instance, the hashed storage stuff uses it to prevent any concurrency: if you try to schedule some hashed storage migrations while some are in progress, it will not be permitted.
I created gitlab-org/gitlab!66553 (merged) to tag these. As @cmiskell pointed out in Slack, this simplifies the routing rules because we can just put this at the top:
["tags=needs_own_queue",nil]
It doesn't simplify the shard queue selectors, unfortunately, but that's still a win.
In gitlab-org/gitlab!66553 (merged), Heinrich pointed out that in the migration doc, we are guiding the users to check for the queue length of BackgroundMigrationWorker beforehand. It looks like the queue-size checking method is recommended for GitLab < 12.8. Since GitLab 12.9, we have can use Gitlab::BackgroundMigration.remaining, which is more accurate. Before this version, Sidekiq routing rule feature is not available either.
However, the doc is not super clear, and a little confusing if the users use both method. We would need to emphasize the versioning difference a little bit.
It does, but it also iterates over the entire scheduled set (which is shared between queues). We could do the same for whatever the actual queue used by BackgroundMigrationWorker is, which would then mean background migrations wouldn't need their own queue I think.
@engwan@qmnguyen0711 is that right? If it is then I can create an MR to let Gitlab::BackgroundMigration.remaining use that method, and then we can say background migrations can share a queue. I'll also update the docs to remove the pre-12.9 method of finding this out.
It does, but it also iterates over the entire scheduled set (which is shared between queues). We could do the same for whatever the actual queue used by BackgroundMigrationWorker is, which would then mean background migrations wouldn't need their own queue I think.
Agree. I think this method is definitely better, as it does not depend on the queue. Iterating through the whole scheduled set and queue seems to be expensive, but I think that's properly okay in this case. We don't run this command in a daily basis though.
@rnienaber I'll assign to myself and take it out of the epic. I need to make sure we've cross-referenced the correct issues in the codebase, and probably we need to document this case, but we have all the problem workers captured here.
I created gitlab-org/gitlab!70069 (merged) to update the code comments to point to the correct issues (all are linked to this issue, too) and document this for self-managed instances.
Gitlab::SidekiqQueue. AFAIK, this class is used by admin users (us in precise) to drop the jobs from a particular queue. It's still working, but as the queue will mean a totally different thing, we should either implement a replacement. This should be mentioned in #997 (closed) as well.
I think we can keep this API and just add a worker parameter, I'll create an MR for that.