Fix ConcurrencyLimitSampler reporting inconsistent limits
Context
This was discovered in gitlab-org/gitlab#553604 (comment 2731909956), after the use_max_concurrency_limit_percentage_as_default_limit FF is enabled.
The feature flag is to set default concurrency limit on all workers based on the worker's urgency the shard's available threads #215 (closed).
Problem
On the worker below, the Max limit should be a flat line, since it's a static limit.
We should be expecting the max limit to be a flat line of 75, based on 25% (low urgency) * 10 (catchall concurrency) * 30 (maxReplicas)
https://dashboards.gitlab.net/goto/8pfJITrHg?orgId=1
So, turns out this is due to the concurrency limit sampler can run from any shard, and the GITLAB_SIDEKIQ_MAX_REPLICAS would also change depending on the shard:
When catchall (the actual shard that the worker resides) reports the metrics, it correctly reports as 75, but if low-urgency-cpu-bound happens to report the metrics, it'd report as different number.
The way that ConcurrencyLimitSampler works is 1 process holding exclusive lease lock (amongst all other processes across all shards) and emit metrics above for all workers. For limits specifically, we also read GITLAB_SIDEKIQ_MAX_REPLICAS env var to calculate the max limit. Therefore different shards would report different limits, regardless of the worker itself.
Potential Solution
Instead of 1 process reporting metrics for all shards, we'll have 1 process per shard emitting the metrics, something like:
diff --git a/lib/gitlab/metrics/samplers/concurrency_limit_sampler.rb b/lib/gitlab/metrics/samplers/concurrency_limit_sampler.rb
index 4c959a2bfe9b..7a578cfce854 100644
--- a/lib/gitlab/metrics/samplers/concurrency_limit_sampler.rb
+++ b/lib/gitlab/metrics/samplers/concurrency_limit_sampler.rb
@@ -54,6 +54,10 @@ def lease_timeout
LEASE_TIMEOUT
end
+ def lease_key
+ "#{self.class.name.underscore}:shard:#{current_queues}"
+ end
+
def report_metrics
workers.each do |w|
queue_size = concurrent_limit_service.queue_size(w.name)
@@ -80,7 +84,11 @@ def reset_metrics
end
def workers
- Gitlab::SidekiqConfig.workers_without_default.map(&:klass)
+ Gitlab::SidekiqConfig.current_worker_queue_mappings.keys
+ end
+
+ def current_queues
+ Sidekiq.default_configuration.queues
end
def concurrent_limit_service

