Skip to content

Tag some workers that we don't run on GitLab.com

Sean McGivern requested to merge tag-workers-as-excluded-from-gitlab-com into master

https://gitlab.com/gitlab-com/runbooks/-/blob/master/rules-jsonnet/temp-ignored-gprd-queue-list.libsonnet has an alert for a list of queues that shouldn't run on GitLab.com, because they aren't used and by not listening to them we can save the Sidekiq Redis from doing a bit of work. https://ops.gitlab.net/gitlab-cookbooks/chef-repo/-/merge_requests/2948 was where we made the initial configuration change for this.

However, in gitlab-com/gl-infra/k8s-workloads/gitlab-com!394 (merged) we started listening to these on k8s, which undid that work. As the queues aren't actually used, the alert didn't fire.

This adds a tag to those workers so we can do tags=exclude_from_kubernetes,exclude_from_gitlab_com in our k8s catchall shard to exclude these.

This isn't exactly the same as in the alert. The differences are from some Geo workers that no longer exist, and some that we've added:

--- expected	2021-05-07 12:34:12.000000000 +0100
+++ actual	2021-05-07 12:44:39.000000000 +0100
@@ -6,7 +6,6 @@
 cronjob:geo_container_repository_sync_dispatch
 cronjob:geo_file_download_dispatch
 cronjob:geo_metrics_update
-cronjob:geo_migrated_local_files_clean_up
 cronjob:geo_prune_event_log
 cronjob:geo_repository_sync
 cronjob:geo_repository_verification_primary_batch
@@ -16,11 +15,15 @@
 cronjob:geo_scheduler_primary_per_shard_scheduler
 cronjob:geo_scheduler_secondary_per_shard_scheduler
 cronjob:geo_secondary_registry_consistency
+cronjob:geo_secondary_usage_data_cron
+cronjob:geo_sync_timeout_cron
+cronjob:geo_verification_cron
 geo:geo_batch_project_registry
 geo:geo_batch_project_registry_scheduler
 geo:geo_container_repository_sync
 geo:geo_design_repository_shard_sync
 geo:geo_design_repository_sync
+geo:geo_destroy
 geo:geo_event
 geo:geo_file_download
 geo:geo_file_registry_removal
@@ -36,10 +39,13 @@
 geo:geo_repository_verification_primary_shard
 geo:geo_repository_verification_primary_single
 geo:geo_repository_verification_secondary_single
+geo:geo_reverification_batch
 geo:geo_scheduler_primary_scheduler
 geo:geo_scheduler_scheduler
 geo:geo_scheduler_secondary_scheduler
-geo:geo_secondary_repository_backfill
+geo:geo_verification
+geo:geo_verification_batch
+geo:geo_verification_timeout
 hashed_storage:hashed_storage_migrator
 hashed_storage:hashed_storage_project_migrate
 hashed_storage:hashed_storage_project_rollback

The only Geo worker that runs on GitLab.com is not affected by this change: https://thanos-query.ops.gitlab.net/graph?g0.range_input=1w&g0.max_source_resolution=0s&g0.expr=sum%20by%20(queue)%20(gitlab_background_jobs%3Aqueue%3Aops%3Arate_5m%7Benvironment%3D%22gprd%22%2C%20feature_category%3D%22geo_replication%22%7D)&g0.tab=0

Merge request reports