Add traceability to Sidekiq worker type feature flags
The run_sidekiq_jobs_{SomeWorker} and drop_sidekiq_jobs_{SomeWorker} FFs are used to control whether a worker should defer (setting run_sidekiq_jobs FF to false) or drop (drop_sidekiq_jobs FF is true) the execution of a job during an incident. The mechanism is described in this runbook.
Since the FF is dynamically generated from a worker name, we don't have traceability of the FF definitions, ie the YAML files are not defined as per usual FF definition. This concern was raised at gitlab-org/gitlab!123762 (comment 1566611529).
Proposals
-
Use a single FF with a new dedicated actor (rough implementation in gitlab-org/gitlab!120338 (closed)).
Pros:
- Clean FF definition. We'll end up with 2 YAML files only.
Cons:
- Unable to perform %-based rollout which is handy during incident to safely release the problematic worker back to execution.
-
Define a YML definition per worker per FF in
config/feature_flags/workers/. Then, add a Rake task to generate the YML automatically and checks in CI when a new worker is added/removed. This would be similar to the rake tasks to maintainall_queues.yml:bin/rake gitlab:sidekiq:all_queues_yml:generateandbin/rake gitlab:sidekiq:all_queues_yml:checkPros:
- Addresses the concern by having FF definition
Cons:
- We'll have ~1300 YAML files in
config/feature_flags/workers/immediately (as we have 664 workers at the time of writing * 2 FFs). - Every MR that adds/removes workers will have to check 2 YAML files per worker.
-
Define a single YML file listing all workers.
Pros:
- Clean FF definition of 2 files.
Cons:
- Violation of the 1 FF per file convention. Might require significant refactoring.