Add labkit rate limit adapter for cohort 1 keys
What does this MR do?
Adds Gitlab::ApplicationRateLimiter::LabkitAdapter, which routes a first cohort of five rate-limit keys (pipelines_create, notes_create, search_rate_limit, users_get_by_id, user_sign_in) through Labkit::RateLimit::Limiter. This is the start of the migration from the in-house ApplicationRateLimiter strategy classes to the shared labkit primitive.
How is the rollout controlled?
Two wip-type feature flags per key:
rate_limiter_use_labkit_<key>opts the key into the labkit path.rate_limiter_use_labkit_<key>_enforcelets labkit's decision win over legacy.
The two flags produce three meaningful states:
_use_labkit |
_enforce |
What runs |
|---|---|---|
| off | off | Legacy only (status quo). |
| on | off | Both paths run; legacy decides. The Prometheus shadow counter records per-key agreement so a 24-hour shadow run can confirm parity. |
| on | on | Only the labkit path runs; its decision is returned. |
The legacy and labkit Redis key namespaces are intentionally disjoint (application_rate_limiter:... vs labkit:rl:...) so both counters can run in parallel during shadow validation without interference.
The full per-key rollout procedure (with chatops commands and pass criteria) is tracked in #598560.
Verification
- 71 RSpec examples (52 existing + 19 new) cover signature stability, scope normalization, the labkit Redis key format, fail-open behavior, the dual-flag wiring, and the Prometheus shadow counter.
- Manual end-to-end testing against a local GDK confirmed each of the five keys behaves correctly in all three states (off / shadow / enforce). Both Redis key shapes appear under shadow; only the labkit shape appears under enforce.
Operational notes
- Feature flag changes take up to 60 seconds to propagate to all puma workers (Flipper L1 process cache TTL). The rollout runbook should pause 60+ seconds between toggles to avoid mixed-state behavior visible in dashboards.
- The labkit path adds one Redis round-trip per check (the recovery
GETused to recovercurrent_countfor the utilization-ratio histogram, sinceLabkit::RateLimit::Resultdoes not yet expose it). This is a temporary cost until labkit'sResultcarries the count natively.
References
Edited by Max Woolf