Migrate batches of workers to default
Once we have done #1073 (closed) and #1072 (closed), we can start migrating actually useful workers. A simple way to pick a slice of workers would be to go by feature category, as it:
- Allows us to grab a subset of workers.
- Means there will be a single team affected if something goes wrong.
- Keeps the selectors relatively simple.
We can start with low-volume feature categories and work up from there.
Migration steps
- Add the routing configuration:
["feature_category=foo", "default"]
for instance- This new rule goes immediately above the final
["*", null]
rule, i.e. after all the other shards but before the final fall-through option. - It is optional whether we construct a single large rule of all the selectors, or add individual rules (they apply in order, and computation cost is largely equivalent either way). Use whatever looks more readable (this may vary over the steps, depending on the expressions necessary)
- This new rule goes immediately above the final
- Validate that the jobs are processed through the
default
queue using logs and metrics. - Migrate any scheduled and to-be-retried jobs using the Rake task from gitlab-org/gitlab!60724 (merged)
- Stop listening to the old queues by adding that feature category (or other expression) to the negated selectors for catchall k8s. This is the payoff for this work as it will reduce saturation on redis-sidekiq once we remove enough queues.
Migration phases
-
not_owned
- production#4912 (closed) -
Miscellaneous low-use categories - production#5175 (closed) -
code_review
/incident_management
/importers
/pages
- production#5188 (closed) -
code_testing
/continuous_delivery
/subgroups
/authentication_and_authorization
/gitaly
/issue_tracking
/requirements_management
- production#5207 (closed) -
continuous_integration
- production#5201 (closed) -
integrations
/dynamic_application_security_testing
- production#5308 (closed) -
source_code_management
- production#5310 (closed) -
git_lfs
/dependency_proxy
/container_network_security
/error_tracking
(these are new categories, or categories with new workers) and cleanup - production#5327 (closed)
Edited by Craig Miskell