Add Prometheus metrics for labkit rate limit checks
Add Prometheus metrics to `Labkit::RateLimit` to observe rate limiting behavior without additional log volume.
Parent epic: https://gitlab.com/groups/gitlab-com/gl-infra/-/work_items/2021
Context: https://gitlab.com/gitlab-org/ruby/gems/labkit-ruby/-/merge_requests/272#note_3294183874
## Existing metrics inventory
The new labkit metrics must provide equivalent observability to the existing rate limiting metrics so dashboards can be updated to show labkit-based rate limiting alongside or instead of the legacy metrics.
### ApplicationRateLimiter
| Metric | Type | Labels | What it measures |
|---|---|---|---|
| `gitlab_application_rate_limiter_throttle_utilization_ratio` | Histogram (buckets: 0.25, 0.5, 0.75, 1.0) | `throttle_key`, `peek`, `feature_category` | Ratio of current count to threshold. Used in the [Rate Limiting Overview dashboard](https://dashboards.gitlab.net/d/rate-limiting-rate-limiting_overview) via bucket subtraction to compute throttled request rate: `rate(bucket{le="+Inf"}) - rate(bucket{le="1"})`. |
### RackAttack
| Metric | Type | Labels | What it measures |
|---|---|---|---|
| `gitlab_rack_attack_events_total` | Counter | `event_type`, `event_name` | Total RackAttack events (throttle/blocklist/track). Rate of events per throttle name. |
| `gitlab_rack_attack_throttle_limit` | Gauge | `event_name` | Configured limit per throttle. |
| `gitlab_rack_attack_throttle_period_seconds` | Gauge | `event_name` | Configured period per throttle. |
### Dashboard usage
The [Rate Limiting Overview dashboard](https://dashboards.gitlab.net/d/rate-limiting-rate-limiting_overview) (source: `runbooks/dashboards/rate-limiting/main.dashboard.jsonnet`) uses these metrics in the RackAttack and ApplicationRateLimiter sections. No alerts are currently configured on these metrics — they are dashboard-only for observability.
## Proposed labkit metrics
### Counters
| Metric | Labels | Purpose |
|---|---|---|
| `gitlab_labkit_rate_limiter_calls_total` | `rate_limiter`, `rule`, `action` | Incremented on every **successful** `check` call (no Redis errors). The `action` label distinguishes outcomes (see below). Equivalent of `gitlab_rack_attack_events_total`. |
| `gitlab_labkit_rate_limiter_errors_total` | `rate_limiter` | Incremented when a `check` call fails (Redis unavailable, etc.). Separate from `calls_total` to keep successful check metrics clean. |
**`action` label values on `calls_total`:**
| `action` | `rule` | Meaning |
|---|---|---|
| `"block"` | rule name | Rule matched with `action: :block` and count exceeded limit — request blocked |
| `"log"` | rule name | Rule matched with `action: :log` and count exceeded limit — would have blocked, only logged (shadow mode) |
| `"allow"` | rule name | Rule matched but count is within limit — request allowed |
| `"allow"` | `"unmatched"` | No rule matched — request allowed (no rate limit applied) |
**Error handling:** When Redis fails, only `errors_total` is incremented. `calls_total` is NOT incremented for failed checks — it reflects only successful checks where we have a definitive outcome. The error is also visible via `result.error?` and the structured warning log.
**Useful PromQL queries:**
- Total successful calls: `sum(rate(gitlab_labkit_rate_limiter_calls_total[5m]))`
- Blocked rate: `sum(rate(gitlab_labkit_rate_limiter_calls_total{action="block"}[5m]))`
- Would-have-blocked (shadow): `sum(rate(gitlab_labkit_rate_limiter_calls_total{action="log"}[5m]))`
- Unmatched rate: `sum(rate(gitlab_labkit_rate_limiter_calls_total{rule="unmatched"}[5m]))`
- Error rate: `sum(rate(gitlab_labkit_rate_limiter_errors_total[5m]))`
- Error ratio: `rate(errors_total[5m]) / (rate(calls_total[5m]) + rate(errors_total[5m]))`
### Gauges
| Metric | Labels | Multiprocess mode | Purpose |
|---|---|---|---|
| `gitlab_labkit_rate_limiter_limit` | `rate_limiter`, `rule` | `:max` | The configured limit value per rule (resolved from callable if applicable). Equivalent of `gitlab_rack_attack_throttle_limit`. |
| `gitlab_labkit_rate_limiter_period_seconds` | `rate_limiter`, `rule` | `:max` | The configured period per rule (resolved from callable if applicable). Equivalent of `gitlab_rack_attack_throttle_period_seconds`. |
Gauges are only set on successful matched checks (when we have the resolved values).
### Multiprocess mode for gauges
GitLab runs Puma with multiple workers. Each worker sets the same gauge value (since the configured limit/period is identical across workers). Using `multiprocess_mode: :max` ensures only **one value per label set** is emitted when Prometheus scrapes, avoiding N duplicate copies.
```ruby
Labkit::Metrics::Client.gauge(
:gitlab_labkit_rate_limiter_limit,
'The configured rate limit threshold',
{ rate_limiter: nil, rule: nil },
:max
)
```
The existing RackAttack gauges use the default `:all` mode (per-worker duplicates). The new labkit gauges improve on this.
Reference: `prometheus-client-mmap` gem — gauge multiprocess modes are `:all`, `:liveall`, `:livesum`, `:max`, `:min`. See `lib/prometheus/client/helper/metrics_processing.rb` for merge behavior.
## Implementation plan
### Files to create/modify
| File | Action | Purpose |
|---|---|---|
| `lib/labkit/rate_limit/metrics.rb` | Create | Module with 4 memoized metric accessors (2 counters, 2 gauges) using `Labkit::Metrics::Client` |
| `lib/labkit/rate_limit/evaluator.rb` | Modify | Emit metrics after evaluation: `calls_total` on success, `errors_total` on failure, gauges on match |
| `lib/labkit/rate_limit.rb` | Modify | Add `autoload :Metrics` |
| `spec/labkit/rate_limit/metrics_spec.rb` | Create | Verify metric definitions, types, labels, multiprocess mode |
| `spec/labkit/rate_limit/evaluator_spec.rb` | Modify | Tests for metrics emission: all action label values, error counter, gauges with resolved callable values |
### Metrics emitted in Evaluator
The `Evaluator` has direct access to the matched rule, resolved limit/period, and result. The metrics are emitted at three points:
1. **Successful match:** After `evaluate_rule` succeeds — increment `calls_total` (with action label), set both gauges with resolved limit/period.
2. **No match:** After rules loop with no match — increment `calls_total` with `matched_rule: "unmatched"`, `action: "allow"`.
3. **Error:** In the `rescue` block — increment `errors_total` only. `calls_total` is NOT incremented.
All metric calls are wrapped in their own `rescue StandardError` to ensure metrics emission never breaks the rate limit check.
## Future investigation: utilization ratio histogram
The ApplicationRateLimiter emits `gitlab_application_rate_limiter_throttle_utilization_ratio` — a histogram showing how close each rate limit is to its threshold (buckets at 25%, 50%, 75%, 100%). The dashboard uses bucket subtraction to derive the throttled request rate.
With the `gitlab_labkit_rate_limiter_calls_total{action="block"}` counter, we get the throttled rate directly. The question is whether the utilization *distribution* (seeing limits at 75% before they fire) adds enough value to justify the cardinality cost of a histogram.
This requires `current_count` on the result object, which is tracked in https://gitlab.com/gitlab-com/gl-infra/production-engineering/-/work_items/28785. Once that ships, we can evaluate whether to add:
| Metric | Type | Labels | Buckets |
|---|---|---|---|
| `gitlab_labkit_rate_limiter_utilization_ratio` | Histogram | `rate_limiter`, `rule` | `[0.25, 0.5, 0.75, 1.0]` |
Decision deferred until #28785 is complete and we can assess the need based on dashboard usage.
issue