[Spec 8] Stage 1a follow-up: Limiter API redesign per reprazent review (labkit-ruby) (#28792) · Issues · GitLab.com / GitLab Infrastructure Team / Production Engineering

[Spec 8] Stage 1a follow-up: Limiter API redesign per reprazent review (labkit-ruby)

## Spec ### Problem Statement [required] MR !270 (Stage 1a) shipped the initial `Labkit::RateLimit` API and was merged 2026-04-27 for momentum. Reviewer reprazent left 10 unresolved threads covering four categories of concern: 1. **API architecture**: the module-level `.check(...)` method allocates a new `Evaluator` on every request (memory churn); Redis and logger have no configure-block; the internal `call_site` argument name is an implementation detail. 2. **Counter semantics**: multi-characteristic rules create N separate Redis keys (one per characteristic) rather than a single compound key; `limit` and `period` cannot be callables, forcing rule rebuilds on DB config change; missing characteristic values skip the counter entirely instead of landing in a known sentinel bucket. 3. **Return-value expressiveness**: a `:log` rule that is exceeded returns `:allow`, making it impossible for callers to distinguish "well within limits" from "would have been rate-limited". 4. **Cleanup**: `KNOWN_CHARACTERISTICS` is hardcoded and duplicated, preventing use of arbitrary identifier keys as characteristics. Additionally, a design decision was made after !270 merged: **first-match-wins** rule evaluation (discussed in https://gitlab.com/gitlab-com/content-sites/handbook/-/merge_requests/19540#note_3284917347). Rules are evaluated in order; the first matching rule is the only one applied. This simplifies tiered overrides (most specific rule first) and avoids the complexity of overlapping counters. The return type also changes from bare symbols to a **result object** carrying the matched rule and action. Response header state (`remaining`, `reset_time`, `limit`, `counter_key`) will be added to the result object in #28785. These must be resolved before Stage 2 (gitlab-rails) consumes the API. ### Non-Goals [required] - **Rack middleware integration** — Stage 2a, separate issue in `gitlab-org/gitlab` - **`ApplicationRateLimiter` integration** — Stage 2b, separate issue - **YAML rule loading** via the configure block — future work, explicitly deferred - **Expression-matching engine / external service** — future work - **labkit-go** — out of scope - **Removing `Labkit::RateLimit.check`** convenience wrapper — kept for callers that cannot cache a Limiter - **Response header state** on the result object (`remaining`, `reset_time`, `limit`, `counter_key`) — tracked in #28785, which enriches the result object with fields needed for `RateLimit-*` response headers ### Acceptance Criteria [required] **Scenario A: Limiter object reuse across requests** - Given: a `Labkit::RateLimit::Limiter` is instantiated with `name:` and `rules:` - When: `.check(identifier)` is called twice with different identifiers - Then: the internal `Evaluator` instance is the same object both times (`object_id` unchanged); no new allocations of Evaluator per call **Scenario B: Compound multi-characteristic Redis key** - Given: a Rule with `characteristics: [:user, :endpoint]`, name `"auth_api"` - When: `limiter.check({ user: 42, endpoint: "/api/foo" })` - Then: Redis receives exactly **one** `incr` call with a compound key like `labkit:rl:rack_request:auth_api:user:42:endpoint:/api/foo`; no separate user-only or endpoint-only keys are written **Scenario C: Missing characteristic → `_unknown_` sentinel in key** - Given: a Rule with `characteristics: [:user]`, name `"auth_api"` - When: `limiter.check({ ip: "1.2.3.4" })` (no `:user` key in identifier) - Then: Redis key uses sentinel: `labkit:rl:rack_request:auth_api:user:_unknown_`; counter is incremented — traffic is not silently uncounted **Scenario D: Callable `limit` and `period` resolved at check time** - Given: a Rule with `limit: -> { 5 }` and `period: -> { 60 }` - When: `limiter.check(identifier)` is called - Then: the lambdas are invoked during `.check`, not during `Limiter.new`; a second `.check` call invokes them again (picking up any config changes) **Scenario E: First-match-wins — only the first matching rule is evaluated** - Given: two rules — rule A (matches) and rule B (also matches) - When: `limiter.check(identifier)` - Then: only rule A's counter is incremented in Redis; rule B is not evaluated and its counter is not touched **Scenario F: Rule ordering — more specific rules first** - Given: rules ordered as `[specific_endpoint_rule, generic_api_fallback]` - When: identifier matches both (it has the specific endpoint and is an API request) - Then: the specific endpoint rule is matched and counted; the generic fallback is not evaluated **Scenario G: No rules match — result indicates no match** - Given: rules array is non-empty but no rule's `match` is satisfied - When: `limiter.check(identifier)` - Then: `result.matched?` is `false`, `result.exceeded?` is `false`, `result.action` is `nil`, `result.rule` is `nil`, no Redis counters are incremented **Scenario H: Empty rules array** - Given: `rules: []` - When: `limiter.check(identifier)` - Then: `result.matched?` is `false`, `result.exceeded?` is `false`, `result.action` is `nil`, `result.rule` is `nil` **Scenario I: Result object — within limit** - Given: a matched rule with `limit: 10`, `action: :block`, current count is 3 - When: `limiter.check(identifier)` - Then: result has `matched?: true`, `exceeded?: false`, `action: :block`, `rule` is the matched Rule object **Scenario J: Result object — `:block` rule exceeded** - Given: a matched rule with `limit: 10`, `action: :block`, current count is 10 - When: `limiter.check(identifier)` - Then: result has `matched?: true`, `exceeded?: true`, `action: :block`, `rule` is the matched Rule object **Scenario K: Result object — `:log` rule exceeded** - Given: a matched rule with `limit: 1`, `action: :log`, current count is 1 - When: `limiter.check(identifier)` - Then: result has `matched?: true`, `exceeded?: true`, `action: :log`, `rule` is the matched Rule object; the caller decides what to do based on `result.action` and `result.exceeded?` **Scenario L: Rule with empty `match` matches any identifier** - Given: a rule with `match: {}` (catch-all / default rule) - When: `limiter.check(identifier)` with any identifier - Then: the rule matches **Scenario M: Configure block wires redis and logger** - Given: `Labkit::RateLimit.configure { |c| c.redis = my_redis; c.logger = my_logger }` - When: `Labkit::RateLimit::Limiter.new(name: "test", rules: [])` with no explicit `redis:` or `logger:` - Then: the Limiter uses `my_redis` and `my_logger` from configuration **Scenario N: Dependency injection overrides configure block** - Given: a global configure block with `redis_a` and `logger_a` - When: `Limiter.new(name: "test", rules: [], redis: redis_b, logger: logger_b)` - Then: the Limiter uses `redis_b` and `logger_b`, not the global ones **Scenario O: Arbitrary characteristic accepted (no KNOWN_CHARACTERISTICS gate)** - Given: a Rule with `characteristics: [:custom_field]`, identifier `{ custom_field: "abc" }` - When: `limiter.check(identifier)` in test/development environment - Then: does not raise `ArgumentError`; key includes `custom_field:abc` **Scenario P: Endpoint query string stripped in Identifier, not Evaluator** - Given: `Identifier.new({ endpoint: "/api/foo?bar=baz&x=1" })` - When: `identifier[:endpoint]` is accessed - Then: returns `"/api/foo"` (query string stripped at construction time) **Scenario Q: Rule `name` appears in Redis key (not positional index)** - Given: a Rule with `name: "unauthenticated_api"`, `characteristics: [:ip]` - When: `limiter.check({ ip: "1.2.3.4" })` - Then: Redis key contains the rule name, e.g. `labkit:rl:rack_request:unauthenticated_api:ip:1.2.3.4`; no integer index **Scenario R: Counter TTL matches rule period** - Given: a rule with `period: 120` (or `period: -> { 120 }`) - When: the counter key does not yet exist in Redis - Then: the key is created with `INCR` and `EXPIRE` with TTL of 120 seconds **Scenario S: Redis unavailable — fails open** - Given: Redis is unreachable - When: `limiter.check(identifier)` - Then: `result.exceeded?` is `false`, `result.error?` is `true`, the failure is logged with the full identifier **Scenario T: Structured logging on every check** - Given: any call to `limiter.check` - When: a rule matches - Then: a structured log entry is emitted containing: the limiter `name`, the full identifier hash, and for the matched rule: `rule_name`, `characteristics`, `counter_key`, `current_count`, `limit`, `period`, `action`, `exceeded` **Scenario U: `name` argument is mandatory and validated** - Given: `Limiter.new(name: nil, ...)` or `Limiter.new(name: "", ...)` - When: the Limiter is instantiated - Then: an `ArgumentError` is raised ### Security Considerations [required] - **Callable limit/period injection**: callables are provided by the caller at `Limiter` init time — a trusted call site — not from user input. The resolved value must be cast to `Integer` before comparison to prevent type confusion. - **`_unknown_` sentinel spoofing**: an identifier with the literal string `"_unknown_"` as a characteristic value would share a counter with requests missing that dimension. This is known, documented behaviour. Rate limiting erring toward counting is the safer default. - **Redis key length**: compound keys with many characteristics could grow large. SHA-256 truncation (`CHAR_VALUE_MAX_LENGTH = 200`) applies per characteristic value, bounding key length. We control the characteristics — they are not user-supplied. - **No bypass risk introduced**: no new configuration path that could be abused to skip enforcement. - **No secrets in identifiers**: documentation and tests must make clear that secrets (tokens, passwords) must never be in identifiers, as they appear in logs and Redis keys. ### Rollout & Backwards Compatibility [required] **Redis key format change (breaking counter state):** The compound key format (`labkit:rl:{name}:{rule_name}:{char1}:{val1}:...`) replaces the per-characteristic format (`labkit:rl:{call_site}:{rule_index}:{char}:{value}`). All existing counters are reset. This is acceptable: Stage 2 (Rails middleware) has not been deployed; labkit rate limiting is not yet enforcing anything in production. **Public API surface:** - `Labkit::RateLimit::Limiter` — **new**. Primary API going forward. - `Labkit::RateLimit.check(...)` — **kept** as convenience wrapper. - Return value changes from bare symbol to **result object**. This is breaking for callers checking `result == :block`. No callers outside labkit-ruby today. - `Rule` gains `name:` required field — **breaking** for direct Rule constructors. No callers outside this repo. - `KNOWN_CHARACTERISTICS` — **removed**. - `call_site:` → `name:` — rename. - **First-match-wins** — behaviour change from evaluating all matching rules. **Self-managed / Dedicated / Cells:** no impact. Library change only; enforcement is downstream (Stage 2). **Dependency on in-progress work:** #28787 (rule names in keys) and #28785 (response header state) reference the old API. Both must be updated to use the new `Limiter#check` return type and first-match-wins semantics. #28785 specifically builds on the result object from this issue, adding `remaining`, `reset_time`, `limit`, and `counter_key` fields. ### Validation Loop / Verification Process [required] **Commands that must pass before requesting human review:** ```bash bundle exec rspec spec/labkit/rate_limit_spec.rb spec/labkit/rate_limit/ \ --format documentation ``` Expected: all scenarios A–U covered, 0 failures. **Key assertions:** - Redis key assertions verify exact compound key structure (not just that `incr` was called) - First-match-wins verified by asserting only one `incr` call when multiple rules match - Callable timing verified with a lambda recording call count - Result object fields asserted individually (`matched?`, `exceeded?`, `action`, `rule`, `error?`) **Evidence to post to MR:** CI pipeline result + `--format documentation` output. ### Observability [optional] **Structured log fields** (emitted on every check where a rule matches): | Field | Description | |---|---| | `name` | Limiter name | | `rate_limiting.identifier` | Full identifier hash | | `rate_limiting.rule_name` | Matched rule name | | `rate_limiting.counter_key` | Derived Redis key | | `rate_limiting.current_count` | Count after increment | | `rate_limiting.limit` | Resolved threshold | | `rate_limiting.period` | Window in seconds | | `rate_limiting.action` | `:block` or `:log` | | `rate_limiting.exceeded` | Boolean | | `rate_limiting.remaining` | Remaining requests | | `rate_limiting.error` | Boolean — true if Redis unavailable | When no rule matches, a log entry is still emitted with `rate_limiting.matched: false`. Note: `remaining` and `current_count` are logged internally by labkit (it knows the count from Redis) but are not exposed on the result object in this issue. They will be added to the result object in #28785.

issue