[Cohort 5] Route resource-usage limits through the labkit rate-limit adapter
What does this MR do and why?
Routes the three per-database Sidekiq resource-usage limits
(main_/ci_/sec_db_duration_limit_per_worker) through the labkit adapter,
one labkit Limiter per database, gated behind the cohort_5 wip flags
for shadow validation and enforcement.
These are cost-mode limits: the value accumulated per window is the job's
DB duration (check(cost:) via labkit's INCRBYFLOAT), not a count of calls.
Each entry is a single rule whose threshold and interval are supplied
per call through rule_context. Gitlab::SidekiqLimits.limits_for already
resolves the worker's urgency rule and any ApplicationSetting override upstream
and returns one resolved [threshold, interval], so the labkit rule must use
that resolved value rather than a static constant; otherwise the shadow
comparison against the legacy path would diverge the moment an override is set.
A single registry flag, cost_mode: true, marks these entries (no per-cohort
caller flags). For routing, the adapter treats cost-mode like set-mode: it always
runs the labkit path (the caller-supplied threshold/interval are this key's real
config, not a user override to bail on), and resolves both per call via
rule_context (cohort 4's callable machinery). cost_mode also tells the adapter
to pass the job's DB duration as check(cost:). A zero-cost job short-circuits
without creating a counter, matching IncrementResourceUsagePerAction#increment.
Sits on top of the cohort 6 adapter refactor (merged), which introduced the
per-key override-routing this extends with a cost_mode clause.
Issue: gitlab-com/gl-infra/production-engineering#28812 (closed)
References
- Cohort 6 (predecessor, merged): !238341 (merged)
- Epic: gitlab-com/gl-infra&2021
Screenshots or screen recordings
Not applicable (backend rate-limiting change, no UI).
How to set up and validate locally
- In the rails console, enable the shadow flag:
Feature.enable(:rate_limiter_use_labkit_cohort_5) - Run a Sidekiq job that consumes DB time (any
ApplicationWorker). Both the legacy counter (application_rate_limiter:...) and the labkit counter (labkit:rl:applimiter_main_db_duration_limit_per_worker:...) should accumulate the job's DB duration, and thegitlab_rate_limiter_labkit_shadow_totalcounter should record agreement between the two paths. - To let labkit's decision win, also enable enforcement:
Feature.enable(:rate_limiter_use_labkit_cohort_5_enforce)
MR acceptance checklist
Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.