Add labkit rate limit adapter for cohort 2 keys

What does this MR do and why?

Migrate 95 IncrementPerAction call sites (83 CE + 12 EE) in Gitlab::ApplicationRateLimiter through Labkit::RateLimit::Limiter, gated by a shared rate_limiter_use_labkit_cohort_2 / _enforce flag pair.

Three pattern extensions over cohort 1:

  • Cohort-wide flag pair instead of per-key flags. Per-key visibility is preserved at the Prometheus layer (gitlab_rate_limiter_labkit_shadow_total{key}). Selection between per-key (cohort 1) and cohort-wide (cohort 2+) is encoded by an optional flag_scope: field on each registry entry; cohort 1 entries retain their per-key flags.

  • SUPPORTED_RATE_LIMITS extracted to a sibling file at lib/gitlab/application_rate_limiter/labkit_adapter/supported_rate_limits.rb, exposed via SupportedRateLimits.all (memoized + frozen) built from SupportedRateLimits.entries (overridable). EE-only keys are merged via super.merge in ee/lib/ee/gitlab/application_rate_limiter/labkit_adapter/supported_rate_limits.rb, prepended through prepend_mod.

  • Type-aware identifier building. The adapter routes scope values to characteristics by AR class (User to :user, Project to :project, etc.) using is_a? so STI subclasses match their registered base (DeployKey populates :key); non-AR values fill remaining characteristics positionally. Polymorphic positions list every accepted name; labkit's '_unknown_' sentinel fills alternatives not in scope, distinguishing Redis keys per real type. This shape caught a latent collision in web_hook_event_resend that would have conflated Project and Group hook resends on _enforce flip.

Three keys initially omitted were added during review: email_verification, web_hook_test, project_testing_integration.

Certain EE keys are excluded with reasons documented in the registry's exclusion comment.

Operational notes

  • Cohort 1 Redis key shape changed. Cohort 1 entries' characteristic names were renamed :project_id / :user_id -> :project / :user to align with the type-aware identifier introduced for cohort 2. Cohort 1 _enforce flags are off in production and the labkit Redis keyspace is disjoint from legacy, so this is shadow-only impact: any prior shadow data on cohort 1 keys ages out of Redis and reaccumulates under the new key shape after this ships.

  • The cohort 2 flag pair is binary on/off, not percentage. Many cohort 2 keys (bitbucket_server_import, gitea_import, github_import, project_import, project_fork_sync, etc.) fire from Sidekiq workers, where Feature.current_request resolves to a per-call UUID and percentage rollouts behave non-deterministically. Operate as fully on or fully off; a rollout work item with chatops commands will follow.

  • No surgical kill switch within cohort 2. With ~95 keys flipping under one flag, a single key with intolerable shadow divergence is mitigated by removing it from the registry and reshipping, not by toggling. Per-key visibility remains in the gitlab_rate_limiter_labkit_shadow_total{key} Prometheus counter.

References

gitlab-com/gl-infra/production-engineering#28809

Edited by Max Woolf

Merge request reports

Loading