[Spec 5] Stage 1a: labkit-ruby Identifier & Rules API (#28784) · Issues · GitLab.com / GitLab Infrastructure Team / Production Engineering

[Spec 5] Stage 1a: labkit-ruby Identifier & Rules API

## Spec ### Problem Statement [required] GitLab's rate limiting is fragmented across 5+ implementations with inconsistent configuration mechanisms. Phase 2 unifies this at the application level. Stage 1a establishes the foundational interface in `labkit-ruby`: an `Identifier` value object that carries request context (user, IP, namespace, plan, endpoint) as key-value pairs, and a `RateLimit.check` API that accepts a call-site name + identifier + rules array, evaluates which rules match, counts independently per rule via Redis, and returns an aggregate result (`:block` / `:allow`). This gem API is the contract that the Rails middleware (Stage 2) will call. Defining it first means Rails can target a stable interface rather than having the interface reverse-engineered from Rails-specific concerns. It also enables the GATE auth architecture to conform to the same contract. **Key design decisions from the epic:** - Call site name identifies what is being rate-limited (e.g., `"rack_request"`, `"pipelines_create"`) - Identifier carries context as key-value pairs: `{ user: 42, ip: "1.2.3.4", namespace: 99, endpoint: "/api/v4/projects" }` - Rules are evaluated independently; each matched rule increments its own Redis counter - Any `:block`-action rule exceeding its limit produces a `:block` aggregate result - Fail-open if Redis is unavailable ### Non-Goals [required] - Integrating labkit into Rails (Stage 2) - Expression-matching or external evaluation engine (Phase 3) - YAML config loading (Stage 3) - Admin UI (Phase 3) - Replacing or modifying existing RackAttack or ApplicationRateLimiter behavior in Rails — this stage only adds new gem code - Implementing the new rack middleware or ApplicationRateLimiter switch (Stage 2b) - Per-rule TTL management beyond standard Redis expiry ### Acceptance Criteria [required] **Scenario 1: Identifier serializes and round-trips** - Given: an identifier `{ user: 42, ip: "1.2.3.4", endpoint: "/api/v4/projects" }` - When: `Labkit::RateLimit::Identifier.new(user: 42, ip: "1.2.3.4", endpoint: "/api/v4/projects").serialize` is called - Then: the serialized form is a stable string; deserializing it returns an identifier with identical key-value pairs; all key names are symbols; all values are strings or integers **Scenario 2: Rule with matching condition is evaluated and counted** - Given: a rule `{ match: { user: 42 }, limit: 100, period: 60, action: :block }` at index 0 in the rules array - When: `Labkit::RateLimit.check(call_site: "rack_request", identifier: { user: 42, ip: "1.2.3.4" }, rules: [rule])` is called - Then: a Redis key of the form `labkit:rl:rack_request:0:user:42` is incremented; the result is `:allow` (count 1 is within limit 100) **Scenario 3: Rule with non-matching condition is skipped** - Given: a rule with `match: { user: 100 }` and `action: :block` - When: called with identifier `{ user: 42 }` - Then: no Redis key is incremented; result is `:allow`; no log entry is written for the skipped rule (non-matching rules are silent) **Scenario 4: Any exceeded :block rule produces :block aggregate result** - Given: two rules, both matching; rule A at index 0: `action: :log, limit: 10`; rule B at index 1: `action: :block, limit: 5`; rule B's counter is at 6 - When: `check` is called - Then: result is `:block`; rule A is independently counted (within its own limit); both rules appear in the structured log output **Scenario 5: All rules within limits produces :allow** - Given: two matching rules with `action: :block`; both counters below their respective limits - When: `check` is called - Then: result is `:allow`; counters for both rules are incremented **Scenario 6 (failure case): Redis unavailable — fail-open** - Given: Redis raises `Redis::CannotConnectError` (or equivalent) on every call - When: `check` is called - Then: result is `:allow`; no exception propagates to the caller; a WARN-level structured log is written with `message: "rate_limit_redis_error"`, `call_site`, and the error class; test assertion: `expect { check(...) }.not_to raise_error` **Scenario 7 (failure case): Unknown characteristic raises in dev/test** - Given: the environment is `:development` or `:test`; a rule specifies `characteristics: [:unknown_key]` where `:unknown_key` is not registered in `Labkit::RateLimit::KNOWN_CHARACTERISTICS` - When: `check` is called - Then: `ArgumentError` is raised with a message naming `:unknown_key`; test: `expect { check(...) }.to raise_error(ArgumentError, /unknown_key/)` **Scenario 8: Unknown characteristic is sentinel-replaced in production** - Given: same as Scenario 7 but environment is `:production` - When: `check` is called - Then: no exception; a WARN-level log is written naming the unknown characteristic; the characteristic value used in the Redis key is the literal string `"unknown_characteristic"`; the call proceeds **Scenario 9: Per-rule independent counting — rules do not share counters** - Given: two matching rules: rule A at index 0 with `limit: 10, period: 60`, rule B at index 1 with `limit: 5, period: 60`; both have `characteristics: [:user]` - When: `check` is called 6 times with the same identifier `{ user: 42 }` - Then: rule A's Redis key is `labkit:rl:rack_request:0:user:42` with count 6 (below 10, `:allow`); rule B's Redis key is `labkit:rl:rack_request:1:user:42` with count 6 (above 5, produces `:block`); the two keys are distinct; deleting rule B's key does not affect rule A's counter **Scenario 10: Empty match condition matches all identifiers** - Given: a rule with `match: {}` (empty hash), `limit: 0`, `action: :block` - When: `check` is called with any identifier - Then: result is `:block` (limit 0 means block all); this is correct and expected behavior, not an edge-case error **Scenario 11: Structured log written per matched rule evaluation** - Given: a matched rule at index 0 that is evaluated against identifier `{ user: 42, ip: "1.2.3.4" }` - When: `check` is called - Then: a structured log entry is written containing: `call_site`, `rule_index` (integer), `action` (string), `limit` (integer), `period` (integer), `count` (integer), `matched: true`, `exceeded` (boolean), `identifier` (hash of characteristic key-value pairs used, e.g. `{"user": 42, "ip": "1.2.3.4"}`), and `redis_key` (the full Redis key for this rule, e.g. `"labkit:rl:rack_request:0:user:42"`) **Scenario 12 (failure case): Invalid call_site raises in dev/test** - Given: `call_site: "rack:request"` (contains a colon) - When: `check` is called in development or test - Then: `ArgumentError` is raised with a message indicating the invalid `call_site`; test: `expect { check(call_site: "rack:request", ...) }.to raise_error(ArgumentError, /call_site/)` **Scenario 13: Invalid call_site is sanitized in production** - Given: `call_site: "rack:request"` and environment is `:production` - When: `check` is called - Then: no exception; the colon is replaced with `_` (producing `"rack_request"`); a WARN-level log is written noting the sanitization; the call proceeds with the sanitized value **Scenario 14: Long char_value is hashed, not truncated** - Given: an identifier with `endpoint: ("a" * 300)` (300-character string) - When: `check` is called and the Redis key is constructed - Then: the Redis key uses the SHA-256 hex digest of the full 300-character string as the `char_value` component, not a truncated prefix; two distinct 300-character strings that share a 256-character prefix produce different Redis keys ### Security Considerations [required] **Redis key format:** Keys use the format `labkit:rl:{call_site}:{rule_index}:{char_key}:{char_value}`. Including `rule_index` (the 0-based position of the rule in the rules array) ensures that two rules sharing the same characteristics and call site never collide on the same counter. For multi-characteristic rules (e.g., `characteristics: [:user, :ip]`), one key is written per characteristic independently — the rule's limit applies to whichever characteristic counter is checked first (see Evaluator design). **Long char_value — hashing, not truncation:** Values exceeding 200 characters are replaced with their SHA-256 hex digest before use in a Redis key. Truncation at a fixed length would allow two distinct values sharing a long prefix to collide on the same counter (quota theft). Hashing eliminates the collision risk. The 200-character threshold is implementation-defined; values at or below it are used verbatim. **call_site validation:** `call_site` must match `/\A[a-z0-9_]+\z/` (lowercase alphanumeric and underscores only). A `call_site` containing `:` or `/` would violate the key format invariant. Dev/test raises `ArgumentError`; production WARNs and sanitizes by replacing invalid characters with `_`. **PII in Redis keys and logs:** User IDs (integers) and IP addresses are already present in existing rate limiting Redis keys. Labkit follows the same model. No new PII categories are introduced. Endpoint paths appearing in identifiers must be normalized (strip query strings) before use as key components. `identifier` values in structured logs follow the same conventions as existing rate limiting logs. **Fail-open trade-off:** Explicitly accepted per epic decision. An attacker disrupting Redis cannot cause DoS via rate limit enforcement. The WARN log ensures the failure is observable. **Characteristic validation:** Prevents a misconfigured caller from silently creating unbounded Redis keyspaces by using unexpected characteristic names. Dev/test raises; production logs + sentinels. **No caller-controlled code execution:** Rules are data structures (hashes), not lambdas or procs. Rule matching is attribute equality or range checking only — no `eval`, no dynamic dispatch on caller-supplied values. ### Rollout & Backwards Compatibility [required] - **New code only:** This stage adds new files to `labkit-ruby`. No existing methods are modified. - **No Rails integration yet:** The gem API exists but is not called from Rails until Stage 2. Zero production impact from this PR. No Redis writes occur until `RateLimit.check` is called from a Rails callsite — the gem version bump alone does not write any Redis keys. - **Versioning:** New minor version of `labkit-ruby` (e.g., from 0.x to 0.(x+1) or 1.x following semver). The Rails `Gemfile` is not updated until Stage 2. - **No feature flags required:** Since this is gem-only with no Rails callsite, there is nothing to flag. - **Self-managed / Dedicated / Cells:** Unaffected — gem is not loaded from new callsites until Stage 2. Redis instance selection (shared cache vs. dedicated) is deferred to Stage 2, which will specify the connection target per deployment type. ### Validation Loop / Verification Process [required] All of the following must pass. Full output posted as a comment on this issue before the MR is marked ready. ```bash # In labkit-ruby repo: # Full new spec suite bundle exec rspec spec/gitlab/rate_limit_spec.rb \ spec/gitlab/rate_limit/identifier_spec.rb \ spec/gitlab/rate_limit/rule_spec.rb \ --format documentation # Confirm fail-open scenario explicitly bundle exec rspec spec/gitlab/rate_limit_spec.rb -e "Redis unavailable" # Confirm characteristic validation in dev/test mode LABKIT_ENV=test bundle exec rspec spec/gitlab/rate_limit/identifier_spec.rb -e "unknown characteristic" # Confirm call_site validation LABKIT_ENV=test bundle exec rspec spec/gitlab/rate_limit_spec.rb -e "invalid call_site" # Confirm long char_value hashing bundle exec rspec spec/gitlab/rate_limit/evaluator_spec.rb -e "long char_value" ``` Test files to create: - `lib/gitlab/rate_limit.rb` — `RateLimit.check` entry point - `lib/gitlab/rate_limit/identifier.rb` — Identifier value object and serialization - `lib/gitlab/rate_limit/rule.rb` — Rule struct with match, limit, period, action, characteristics - `lib/gitlab/rate_limit/evaluator.rb` — Rule matching, Redis counting, aggregate result - `spec/gitlab/rate_limit_spec.rb` — Integration-level scenarios (Scenarios 1–14) - `spec/gitlab/rate_limit/identifier_spec.rb` — Identifier unit tests - `spec/gitlab/rate_limit/rule_spec.rb` — Rule struct unit tests - `spec/gitlab/rate_limit/evaluator_spec.rb` — Evaluator unit tests including key format and hashing Every Given/When/Then scenario maps to at least one `it` block. Tests must catch a no-op implementation (i.e., a `check` that always returns `:allow` must fail Scenario 4). ### Observability [optional] Per-rule evaluation log (written for every matched rule, INFO when within limit, WARN when exceeded): ```json {"severity":"INFO","message":"rate_limit_check","call_site":"rack_request","rule_index":0,"action":"block","limit":100,"period":60,"count":42,"matched":true,"exceeded":false,"identifier":{"user":42,"ip":"1.2.3.4"},"redis_key":"labkit:rl:rack_request:0:user:42"} ``` Fail-open log (WARN, written when Redis is unavailable): ```json {"severity":"WARN","message":"rate_limit_redis_error","call_site":"rack_request","error":"Redis::CannotConnectError","result":"allow"} ``` On-call filter: `message=rate_limit_redis_error` indicates fail-open events; `message=rate_limit_check AND exceeded=true` identifies active throttling. **On-call key operations:** Given a `rate_limit_check` log entry, use the `redis_key` field directly: ```bash # Inspect current counter value redis-cli GET labkit:rl:rack_request:0:user:42 # Check time until natural expiry redis-cli TTL labkit:rl:rack_request:0:user:42 # Manually unblock a user (e.g., false positive) redis-cli DEL labkit:rl:rack_request:0:user:42 ```

issue