Add labkit rate limit adapter for cohort 1 keys (!233816) · Merge requests · GitLab.org / GitLab

What does this MR do?

Adds Gitlab::ApplicationRateLimiter::LabkitAdapter, which routes a first cohort of five rate-limit keys (pipelines_create, notes_create, search_rate_limit, users_get_by_id, user_sign_in) through Labkit::RateLimit::Limiter. This is the start of the migration from the in-house ApplicationRateLimiter strategy classes to the shared labkit primitive.

How is the rollout controlled?

Two wip-type feature flags per key:

rate_limiter_use_labkit_<key> opts the key into the labkit path.
rate_limiter_use_labkit_<key>_enforce lets labkit's decision win over legacy.

The two flags produce three meaningful states:

`_use_labkit`	`_enforce`	What runs
off	off	Legacy only (status quo).
on	off	Both paths run; legacy decides. The Prometheus shadow counter records per-key agreement so a 24-hour shadow run can confirm parity.
on	on	Only the labkit path runs; its decision is returned.

The legacy and labkit Redis key namespaces are intentionally disjoint (application_rate_limiter:... vs labkit:rl:...) so both counters can run in parallel during shadow validation without interference.

The full per-key rollout procedure (with chatops commands and pass criteria) is tracked in #598560.

Verification

71 RSpec examples (52 existing + 19 new) cover signature stability, scope normalization, the labkit Redis key format, fail-open behavior, the dual-flag wiring, and the Prometheus shadow counter.
Manual end-to-end testing against a local GDK confirmed each of the five keys behaves correctly in all three states (off / shadow / enforce). Both Redis key shapes appear under shadow; only the labkit shape appears under enforce.

Operational notes

Feature flag changes take up to 60 seconds to propagate to all puma workers (Flipper L1 process cache TTL). The rollout runbook should pause 60+ seconds between toggles to avoid mixed-state behavior visible in dashboards.
The labkit path adds one Redis round-trip per check (the recovery GET used to recover current_count for the utilization-ratio histogram, since Labkit::RateLimit::Result does not yet expose it). This is a temporary cost until labkit's Result carries the count natively.

References

#598560 (rollout tracking)
gitlab-com/gl-infra/production-engineering#28803 (closed)
gitlab-com/gl-infra/production-engineering#28808

Edited Apr 29, 2026 by Max Woolf

Add labkit rate limit adapter for cohort 1 keys

What does this MR do?

How is the rollout controlled?

Verification

Operational notes

References

Merge request reports