Implement Sidekiq queue re-routing in the application
We want to allow a configurable mapping between queue selectors and queue names. In gitlab.yml
, gitlab.rb
, etc. we will support an array of tuples like so:
[
["resource_boundary=memory", null],
["*", "default"]
]
The first element in each tuple is a selector (including the special *
selector). The second element is a queue name. A nil
queue name means that the rule matches but the queue name does not change (we use whatever queue name we would have).
Scope
This issue is scoped solely to allowing this inside the Rails application: if you configure it in gitlab.yml
, it will work. We should put this behind a feature flag and document it as part of this issue.
Implementation
Rules are evaluated from first to last, and as soon as we find a match for a given worker we stop processing for that worker (first match wins).
To actually convert the rules into something usable, the preferred option is to process these rules on application start and update the Sidekiq options for our worker classes to match. This means that everything will 'just work' from Sidekiq's perspective. One way of getting all of our workers is:
ObjectSpace.each_object(Class).select { |k| k < ApplicationWorker }
But there may be a better way.
If this isn't feasible, we can consider a Sidekiq client middleware to change the queue that a job is pushed into. However, this will have some disadvantages:
- Middlewares take a
queue
argument, which will be wrong if we only make the change in a middleware. - We'll only hit errors when a job is scheduled, whereas if we process all the rules up front we'll find errors early.
More background
More details from #987 (closed):
The mapping worker routing rule is an array of tuple. Each tuple includes a selector, and corresponding queue. First match wins. If a worker doesn't match any selector, it's queue is translated from the worker name. If a worker matches a selector, but the value is null, it uses a translated queue name. Otherwise, use the queue name specified next to the selector. The reason why we use an array instead of an object is that both JSON specification and Ruby hashes don't guarantee key ordering. That may affect the matching order.
On GitLab.com, replicate the sharding configuration in production. All the queue names are set to
null
. As a result, after the routing logic is deployed and/or the configuration is set, all of the jobs are routed to worker-name queues, just like before.[ ["resource_boundary=memory", null], ["feature_category=database&urgency=throttled", null], ["feature_category=gitaly&urgency=throttled", null], ["feature_category=global_search&urgency=throttled", null], ["resource_boundary=cpu&urgency=default,low", null] ["resource_boundary=cpu&urgency=high&tags!=requires_disk_io", null] ["resource_boundary!=cpu&urgency=high", null] ["*", null] ]
(Optional) We could add the matching selector into Sidekiq structured logs. After the change is deployed, even though the jobs are still routed the same as before, we can verify whether a worker matches a selector as we expected. This boosts our confidences.
Iterations
-
Extract query matching logic out of CLI: gitlab-org/gitlab!59550 (merged) -
Implement queue router