Draft: [Spec 7] Stage 1b: Rule names — stable identity in keys, logs, and headers (labkit-ruby) (#28787) · Issues · GitLab.com / GitLab Infrastructure Team / Production Engineering

Draft: [Spec 7] Stage 1b: Rule names — stable identity in keys, logs, and headers (labkit-ruby)

## Spec ### Problem Statement [required] Stage 1a (Spec 5 / [labkit-ruby!270](https://gitlab.com/gitlab-org/ruby/gems/labkit-ruby/-/merge_requests/270)) ships the `Labkit::RateLimit.check(call_site:, identifier:, rules:)` entry point, the `Identifier`/`Rule` value objects, and the `Evaluator` that does the per-rule Redis counting. Each matching rule gets an independent counter, with the key shape: ``` labkit:rl:{call_site}:{rule_index}:{char_key}:{char_value} ``` `rule_index` is the rule's 0-based position in the `rules` array. It was introduced in Spec 5 as a uniqueness suffix so that two rules sharing characteristics on the same call site never collide — but it has three problems that this spec addresses: 1. **Counters silently move when a rule is reordered or removed.** Inserting a new rule at index 0 reassigns every existing rule's counter to a new key. In-flight counts are abandoned mid-window, which is invisible at runtime and dangerous during a live rollout (e.g. a SRE adding a temporary `:log` rule to investigate an incident inadvertently resets every other counter for that call site). 2. **Logs and response headers cannot identify which rule fired in human terms.** Spec 5's log payload includes `rule_index: 0`, not a name. The epic's 1b/1c sections explicitly call for a `name` field "for logging and response headers" so that an SRE reading a log or a customer reading `RateLimit-Remaining` can identify *which* rule fired (e.g. `"authenticated_api"` vs `"unauthenticated_api"`). 3. **Spec 6 / Stage 1c (response-header state) has no stable handle to expose.** Its `RuleState` currently carries `rule_index`, which is meaningless to a customer or downstream log consumer. Stage 1b closes this gap by introducing a required `name` attribute on `Rule`, validating its shape, requiring uniqueness within a single `.check` call, and threading it through the Redis key, the structured log payload, and the data passed to Stage 1c's `RuleState`. The `Labkit::RateLimit.check` signature is unchanged; the change is in the rule shape callers pass. This is the "real" Stage 1b from epic [#2021](https://gitlab.com/groups/gitlab-com/gl-infra/-/work_items/2021). Stage 1a (Spec 5) bundled the `RateLimit.check` orchestration alongside the identifier work; this spec finishes the Stage 1b scope from the epic — rule identity — without redoing what Spec 5 already shipped. ### Non-Goals [required] - No changes to `Labkit::RateLimit.check` parameter list (still `call_site:`, `identifier:`, `rules:`) - No changes to `Labkit::RateLimit.check` return type (still `:allow` / `:block` Symbol — Spec 6 / Stage 1c is responsible for the richer `Result` return type) - No changes to `Identifier` or `Evaluator` orchestration logic beyond key construction and log payload - No persistence/migration of existing `rule_index`-keyed counters: the Stage 1a key format has no production callers (Stage 2 has not begun); old counters age out on their natural TTL - No support for a "rule namespace" or hierarchical name (e.g. `api.authenticated_api`); names are flat strings - No support for unnamed rules (legacy / convenience) - No expression-matching, OR-matching, or rule fallback ordering — that remains future work - No Rails integration (Stage 2) ### Acceptance Criteria [required] **Scenario 1: Rule requires `name`** - Given: a rule built without a `name:` keyword - When: `Rule.new(match: { user: 42 }, characteristics: [:user], limit: 100, period: 60, action: :block)` is called - Then: `ArgumentError` is raised with a message containing `"name"`; test: `expect { Rule.new(...) }.to raise_error(ArgumentError, /name/)` **Scenario 2: Valid `name` is preserved on the rule** - Given: `Rule.new(name: "authenticated_api", match: { user: 42 }, characteristics: [:user], limit: 100, period: 60, action: :block)` - When: `rule.name` is read - Then: returns the symbol-equivalent or frozen string `"authenticated_api"` (whichever the spec settles on; tests assert equality against the canonical form documented in the gem's README) **Scenario 3: Invalid `name` raises in dev/test** - Given: `RAILS_ENV` is `"development"` or `"test"`; a rule with `name: "Authenticated API"` (contains uppercase and space) - When: `Rule.new(...)` (or `.check` with that rule in the array) is called - Then: `ArgumentError` is raised with a message naming the invalid `name` value and the format requirement; test: `expect { Rule.new(name: "Authenticated API", ...) }.to raise_error(ArgumentError, /name/)` **Scenario 4: Invalid `name` is sanitized in production** - Given: `RAILS_ENV` is `"production"` (or unset / not in `[development, test]`); a rule with `name: "Authenticated API!"` - When: `.check` is called with that rule in the array - Then: no exception; the name is sanitized by replacing characters not matching `/\A[a-z0-9_]+\z/` with `_` and lowercasing, producing `"authenticated_api_"`; a WARN-level structured log entry is written with `message: "rate_limit_invalid_rule_name"`, the original name, the sanitized name, and the `call_site`; the call proceeds with the sanitized name in the Redis key and log payload **Scenario 5: Duplicate `name`s within one `.check` call raise in dev/test** - Given: `RAILS_ENV` is `"test"`; a rules array with two rules sharing `name: "authenticated_api"` - When: `.check` is called with that array - Then: `ArgumentError` is raised with a message containing the duplicated name; test: `expect { check(rules: [r1, r1_dup]) }.to raise_error(ArgumentError, /authenticated_api/)` **Scenario 6: Duplicate `name`s in production drop later occurrences** - Given: `RAILS_ENV` is `"production"`; a rules array with two rules sharing `name: "authenticated_api"` (call them `r_first` and `r_dup`, in that order) - When: `.check` is called - Then: no exception; a WARN-level log with `message: "rate_limit_duplicate_rule_name"` is written naming the duplicated name and recording that one occurrence was dropped; only `r_first` is evaluated and only its counter is incremented; `r_dup` is silently skipped. Rationale: silently suffixing duplicates produces an unstable name (it depends on array ordering) and would surface in customer-facing response headers, leaking internal config layout. Dropping is loud in logs, deterministic, and avoids quota sharing. **Scenario 7: Redis key uses `name` not index** - Given: a rule `{ name: "authenticated_api", match: { user: 42 }, characteristics: [:user], limit: 100, period: 60, action: :block }` at any position in the rules array - When: `.check` is called with `call_site: "rack_request"` and identifier `{ user: 42 }` - Then: the Redis key incremented is `labkit:rl:rack_request:authenticated_api:user:42`; specifically, the key does NOT include the integer rule index **Scenario 8: Reordering rules in the array does not move counters** - Given: a rules array `[r_a, r_b]` where `r_a.name = "first"` and `r_b.name = "second"`; `.check` is called once with this array, incrementing both `r_a` and `r_b`'s counters by 1 - When: `.check` is called a second time with the array reordered to `[r_b, r_a]` - Then: the key for `r_a` (`labkit:rl:rack_request:first:user:42`) has count 2 (not reset by the reordering); the key for `r_b` has count 2 likewise; no key with `:0:` or `:1:` segments exists **Scenario 9: Two rules with the same characteristics but different names do not collide** - Given: rules array with `r_a = { name: "rule_a", characteristics: [:user], ... }` and `r_b = { name: "rule_b", characteristics: [:user], ... }`; both match - When: `.check` is called with identifier `{ user: 42 }` - Then: two distinct keys are incremented: `labkit:rl:rack_request:rule_a:user:42` and `labkit:rl:rack_request:rule_b:user:42`; deleting `rule_a`'s key does not affect `rule_b`'s counter **Scenario 10: Structured log includes `rule_name`** - Given: a matched rule with `name: "authenticated_api"` and `action: :block`, count below limit - When: `.check` is called - Then: an INFO-level structured log entry is emitted with `rule_name: "authenticated_api"` (in addition to the fields already in Spec 5's log payload — `call_site`, `action`, `limit`, `period`, `count`, `matched`, `exceeded`, `identifier`, `redis_key`); `rule_index` is no longer present in the payload **Scenario 11: WARN log when a `:block` rule is exceeded includes `rule_name`** - Given: a matched rule with `name: "authenticated_api"`, `action: :block`, count above limit - When: `.check` is called - Then: a WARN-level structured log entry is emitted with `rule_name: "authenticated_api"`, `exceeded: true`, and `severity: "WARN"`; the aggregate return value is `:block` **Scenario 12 (failure case): `name` is not a String or Symbol** - Given: a rule built with `name: 42` (Integer) - When: `Rule.new(...)` is called - Then: `ArgumentError` is raised with a message indicating `name` must be a `String` or `Symbol`; test: `expect { Rule.new(name: 42, ...) }.to raise_error(ArgumentError, /name/)` **Scenario 13 (failure case): `name` exceeds maximum length** - Given: a rule with `name:` set to a 65-character string of valid characters - When: `Rule.new(...)` is called in `RAILS_ENV=test` - Then: `ArgumentError` is raised; the maximum length is **64 characters**, chosen so that the human-readable segment of the Redis key fits well within typical observability tool truncation limits (the surrounding key shape — `labkit:rl:` + a ~32-char `call_site` + `:` + 64-char name + `:` + char_key + `:` + char_value (≤200 raw or 64-char SHA-256 hex) — stays under 256 bytes for the common case). Production sanitizes by truncating to 64 and emitting a WARN. **Scenario 14: Sanitized name is still unique-checked** - Given: `RAILS_ENV=production`; rules array `[{ name: "Foo!" }, { name: "foo_" }]` — after sanitization, both become `"foo_"` - When: `.check` is called - Then: this is treated as a duplicate (Scenario 6 path) — the second occurrence is dropped with a WARN. The Redis key for the surviving rule is `labkit:rl:rack_request:foo_:user:42`. ### Security Considerations [required] - **Redis key surface:** `name` becomes part of the Redis key. The `/\A[a-z0-9_]+\z/` validation prevents key-format injection (no `:` separator characters; no `*` glob characters that could mis-target `KEYS`/`SCAN`). The 64-character limit prevents unbounded key growth from a misconfigured caller. - **Log injection:** `name` appears in structured log payloads. Same validation applies — no control characters, no quotes, no JSON-breaking content. The sanitization-in-production path runs *before* the value reaches the log output; the original (unsanitized) value is included in the `rate_limit_invalid_rule_name` WARN entry as a separate `original_name` field so log consumers can still see what the caller passed. - **Quota theft via duplicate names:** Two rules with the same `name` (after sanitization) would share a counter, allowing one rule's traffic to consume another rule's quota. The dev/test raise path makes this loud during development; the production drop-with-WARN path eliminates the sharing without raising and without producing an unstable disambiguated name that would leak into customer-facing headers. - **PII in names:** Names are caller-supplied static strings (e.g. `"authenticated_api"`), not derived from request data. There is no path by which user input can become part of a `name`. This is documented in the gem README and enforced socially, not technically — the same way `call_site` is treated. - **Backward-compatibility key collision:** The new key shape (`:{name}:`) and the old key shape (`:{rule_index}:`) could in principle collide if a caller chose an all-numeric `name`. This is not prevented by validation — it is acceptable because no production caller exists yet (Stage 2 has not started), so there are no live `rule_index`-shaped counters to collide with. Once Stage 2 begins, callers should use descriptive names; this is a README convention, not a code-enforced rule. ### Rollout & Backwards Compatibility [required] - **Depends on Spec 5 / MR [!270](https://gitlab.com/gitlab-org/ruby/gems/labkit-ruby/-/merge_requests/270) being merged first.** This spec edits files (`rule.rb`, `evaluator.rb`, README) that Spec 5 introduces. If !270 is still in review when implementation of this spec starts, this spec's MR is rebased onto Spec 5's branch. - **No production callers exist.** Spec 5 is gem-only; Stage 2 (Rails integration) has not started. Therefore no live Redis counters use the old key shape, and no live log consumers parse `rule_index`. The `rule_index → name` switch is a clean replacement, not a migration. - **Self-managed / Dedicated / Cells:** Unaffected — gem is not loaded from any new callsite until Stage 2. The new gem version is a minor bump on the 0.x line. - **Versioning:** Minor version bump of `labkit-ruby`. Rails `Gemfile` is not updated until Stage 2. - **No feature flag required:** Gem-level change with no live callers; nothing to flag. - **README update:** The `lib/labkit/rate_limit/README.md` "Counter key construction" section is updated to use `name` in place of `rule_index`. The "Multi-characteristic rules: OR semantics" diagram is updated. The "What is intentionally not here" section's "Response-header state" bullet remains; "Rule names" is removed from any future-work list since this spec lands them. - **Stage 1c (Spec 6 / 28785) coordination:** Spec 6's `RuleState` is currently spec'd with `rule_index`. The clean ordering is: Spec 6's body is updated to carry `name` instead of `rule_index` *before* Spec 6's implementation begins. If Spec 7 is posted before Spec 6 implementation starts, leave a coordinating note on 28785 referencing this spec so the next person picking up Spec 6 sees it. If Spec 6 has already started, this spec rebases onto Spec 6's branch and `RuleState` is updated as part of this MR. ### Validation Loop / Verification Process [required] All of the following must pass on the implementation MR. Full output posted as a comment on the issue before the MR is marked ready for human review. ```bash # In labkit-ruby repo, after rebasing onto Spec 5's branch (or master if !270 is merged): # Full rate-limit suite mise exec -- bundle exec rspec \ spec/labkit/rate_limit_spec.rb \ spec/labkit/rate_limit/identifier_spec.rb \ spec/labkit/rate_limit/rule_spec.rb \ spec/labkit/rate_limit/evaluator_spec.rb \ --format documentation # Required-name validation RAILS_ENV=test mise exec -- bundle exec rspec spec/labkit/rate_limit/rule_spec.rb -e "name is required" # Format validation: dev/test raises, production sanitizes RAILS_ENV=test mise exec -- bundle exec rspec spec/labkit/rate_limit/rule_spec.rb -e "invalid name" RAILS_ENV=production mise exec -- bundle exec rspec spec/labkit/rate_limit_spec.rb -e "invalid name sanitized" # Duplicate-name handling RAILS_ENV=test mise exec -- bundle exec rspec spec/labkit/rate_limit_spec.rb -e "duplicate rule names raise" RAILS_ENV=production mise exec -- bundle exec rspec spec/labkit/rate_limit_spec.rb -e "duplicate rule names dropped" # Reorder-stability — the central regression test mise exec -- bundle exec rspec spec/labkit/rate_limit/evaluator_spec.rb -e "reordering rules does not move counters" # Rubocop mise exec -- bundle exec rubocop \ lib/labkit/rate_limit/rule.rb \ lib/labkit/rate_limit/evaluator.rb \ spec/labkit/rate_limit/rule_spec.rb \ spec/labkit/rate_limit/evaluator_spec.rb \ spec/labkit/rate_limit_spec.rb ``` Files to modify: - `lib/labkit/rate_limit/rule.rb` — add `name` to the `Struct.new(...)` keyword list; add validation in `initialize` - `lib/labkit/rate_limit/evaluator.rb` — replace `rule_index` with rule `name` in `build_redis_key`; replace `rule_index` with `rule_name` in the log payload; add the duplicate-name check before the rule-evaluation loop - `lib/labkit/rate_limit/README.md` — update key shape documentation, examples, and diagrams; remove any reference to `rule_index` in the on-call section - `lib/labkit/rate_limit.rb` — if `.check` itself orchestrates duplicate-name detection (preferred — keeps `Evaluator` focused on counting), add it here Files to create: - (none — extends Spec 5's existing files) Every Given/When/Then scenario maps to at least one `it` block. Tests must catch a no-op implementation: a `Rule` that ignores `name:` must fail Scenarios 7 and 9; an evaluator that retains `rule_index` in keys must fail Scenarios 7 and 8. ### Observability [optional] Per-rule evaluation log payload after this spec lands (changes from Spec 5 marked with `←`): ```json { "severity": "INFO", "message": "rate_limit_check", "call_site": "rack_request", "rule_name": "authenticated_api", // ← new (replaces rule_index) "action": "block", "limit": 100, "period": 60, "count": 42, "matched": true, "exceeded": false, "identifier": { "user": 42, "ip": "1.2.3.4" }, "redis_key": "labkit:rl:rack_request:authenticated_api:user:42" } ``` New WARN events introduced by this spec: ```json { "severity": "WARN", "message": "rate_limit_invalid_rule_name", "call_site": "rack_request", "original_name": "Authenticated API!", "sanitized_name": "authenticated_api_" } { "severity": "WARN", "message": "rate_limit_duplicate_rule_name", "call_site": "rack_request", "name": "authenticated_api", "dropped_occurrence": 2 } ``` On-call filters: - `message=rate_limit_check AND rule_name="authenticated_api" AND exceeded=true` — find requests blocked by a specific rule - `message=rate_limit_invalid_rule_name` — find misconfigured callers - `message=rate_limit_duplicate_rule_name` — find accidental rule duplication (e.g. a feature flag adding a rule that already exists statically); the `dropped_occurrence` field tells you which 1-based position in the rules array was discarded On-call key inspection — given a `rate_limit_check` event, the `redis_key` field is still directly usable; nothing changes about the inspection workflow except that the second-to-last key segment is now a human-readable name rather than an integer index. ---

issue