Release plan for making Sidekiq Cluster enabled by default
In gitlab-com/gl-infra/scalability#198 (closed) and the related MRs, I'm adding Sidekiq Cluster support to our Helm charts. We have similar work underway for GDK, source installs, and Omnibus in gitlab-com/gl-infra&181 (closed).
Why use Sidekiq Cluster?
The main advantages of using Sidekiq Cluster are:
- You can easily run more than one Sidekiq process.
- It supports negating a selection of queues.
- It supports an experimental queue selector syntax that we only expect to be used on GitLab.com.
- By having one entry point, it's easier to make future changes.
The first two points don't actually apply to Helm charts: for now, we're going with one process per container (this may change), and Helm charts already support negating queues. The third point is nice but it doesn't require it to be a default.
Why make it the default?
This means that the main reason we'd like to make Sidekiq Cluster the default is the fourth point, plus the 'that's what we do everywhere else' factor.
When do we want to make it the default?
For all other installation methods, we're targeting 13.0. For Helm charts, as the versioning is different, I wanted to ask in this issue if that's feasible.
What is the problem with making it the default?
Omnibus and source installs don't currently support selecting or negating queues, so using Sidekiq Cluster simply adds a feature, and other settings should port over automatically. The Helm charts do support these features (which is good!) but in a way that's not compatible with the way Sidekiq Cluster does it.
Charts select queues using an array, which may be a two-dimensional array if it contains queue weights. So it could be ["merge", "post_receive"]
, or [["merge", 2], ["post_receive", 1]]
. (As an aside, there are some small bugs in the way negation works right now in the Helm charts that are largely due to our complicated queue setup in the application.)
Sidekiq Cluster selects queues using a comma-separated list of queue names, with no weights.
What options do we have?
I'm not an expert here, but I see three main options:
- Make Sidekiq Cluster the default, bumping the versioning as needed.
- Pros: very simple.
- Cons: this will break some existing configurations. The breakage is easy enough to fix, but it will be a hard failure.
- Don't make Sidekiq Cluster the default.
- Pros: also very simple.
- Cons: Helm charts will be different to every other installation method. Also, we eventually (14.0) want to make Sidekiq Cluster the only method for launching Sidekiq, and this would delay that work.
- Try to automatically convert configurations.
- This is possible, although you'd lose the weights. Both
["merge", "post_receive"]
and[["merge", 2], ["post_receive", 1]]
would become"merge,post_receive"
. - Pros: mostly Just Works.
- Cons: more complicated to implement, more complicated to communicate.
- This is possible, although you'd lose the weights. Both