Phase 2: Agentic Implementation Plan (#2021) · Epics · GitLab Infrastructure Team

Phase 2: Agentic Implementation Plan

# Phase 2: Rate Limiting Simplification — Execution Plan ## Context ### Where this fits This epic covers **Phase 2** of the [Simplifying Rate Limiting Configuration](https://handbook.gitlab.com/handbook/engineering/architecture/design-documents/rate_limiting_simplification/#phase-2-simplify-application-level-configuration) design document. It builds on the [Next Rate Limiting Architecture](https://handbook.gitlab.com/handbook/engineering/architecture/design-documents/rate_limiting/#framework-to-define-and-enforce-limits) blueprint's vision for a framework to define and enforce limits. - **Phase 1** (edge network + bypass config): Completed May 2025. - **Phase 2** (this epic): Simplify and unify application-level rate limiting configuration. - **Phase 3** (future): Rate Limiting Interface — a centralized UI/SSOT for managing all limits. Phase 2 was paused in July 2025 because the GATE (GitLab Adaptive Trust Environment) auth architecture was expected to change where enforcement lives. GATE has since been defined. The **strategic decision is to "go first"**: the unified configuration interface defined in Phase 2 becomes the contract that GATE and future services must conform to, not the other way around. ### The problem GitLab's application rate limiting is fragmented across multiple implementations, each with different configuration mechanisms: | Implementation | How limits are configured | Dry run support | Bypass support | |----------------|---------------------------|-----------------|----------------| | **RackAttack** | Mix of hardcoded values and `ApplicationSetting` DB columns. Env var `GITLAB_THROTTLE_DRY_RUN` for dry run. Env var `GITLAB_THROTTLE_USER_ALLOWLIST` for bypass. | Per-throttle via env var | Per-user via env var | | **ApplicationRateLimiter** | Some via Application Settings UI, some via API, some hardcoded in [`rate_limits` hash](https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/application_rate_limiter.rb). | Inconsistent | Inconsistent | | **5+ other limiter types** | Various — see [the full list](https://handbook.gitlab.com/handbook/engineering/architecture/design-documents/rate_limiting_simplification/#phase-2-simplify-application-level-configuration). | Varies | Varies | This means: - **You can't configure all rate limits the same way.** Some require code changes, some need env vars, some have an API, some have a UI. There is no single answer to "how do I change a rate limit?" - **Dry run and bypass behavior is inconsistent.** Whether you can test a rate limit change safely depends on which limiter implementation it uses. - **New endpoints don't get rate limits by default.** Services can ship without rate limits, and adding them after the fact risks user impact. - **It's hard to know what's rate limited and why.** During incidents, engineers have to check multiple places. For customers, understanding why they're being limited is equally difficult. ### What Phase 2 is — and what it is NOT **Phase 2 is about fixing the fundamentals of how rate limits are defined, identified, and counted in the application code.** The core deliverable is a **rate limit identifier design** and a **unified counting/logging API in `labkit-ruby`** that all rate limiting in GitLab flows through. Each rate limit check has two pieces of identity: a **call site name** (`rate_limiter`) that identifies *what is being rate limited* (e.g., `"rack_request"` for the middleware, `"create_project"` for an application action), and a **request identifier** — a key-value object describing *the context of the request* (user, IP, namespace, plan, endpoint). Together they are: 1. What appears in **logs** for every request (so engineers and support can understand what happened) 2. What is used to derive the **Redis counting key** (so we count the right thing) 3. What a **future external service** will use for lookups (so we can manage limits centrally in a later phase) The call site name is provided as a separate argument and is always prepended to counter keys, ensuring different call sites never share counters. The identifier shape varies per call site — the rack middleware provides request type, IP, endpoint, namespace; an application action provides user, project, namespace. In this phase, the **caller** (rack middleware or application code) is responsible for constructing the call site name and identifier, and passing them to labkit along with a set of **rules**. Each rule specifies a match condition (key-value pairs that must be present in the identifier), characteristics (what to count by), limit, period, and action (`block` or `log`). Labkit evaluates rules in order and applies **first-match-wins** — only the first matching rule is counted, and its result is returned. Rules must be ordered from most specific to least specific by the caller. This is a lightweight precursor to the full expression-matching engine that will come with the external service in a future phase — the caller still provides the rules, but labkit handles the matching and counting. In a future phase, an **external service** — inspired by [Cloudflare's rate limiting model](https://developers.cloudflare.com/waf/rate-limiting-rules/parameters/) — will receive the call site name and identifier, and provide the rules, replacing what the caller passes in. The rules structure in this phase is designed to be forward-compatible with that. #### Concrete examples Each rate limit check passes a **call site name** (`rate_limiter`) and a **request identifier** (key-value object) as separate arguments. The call site name identifies the checkpoint; the identifier describes the request context. Together with a set of rules, these are passed to labkit which evaluates rules in order, applying **first-match-wins** — only the first matching rule is counted. Rules must be ordered from most specific to least specific. The caller may make multiple `check` calls per request for different concerns (e.g., per-IP and per-user limits as separate rate limiters). ##### Rack middleware example Consider a request to `GET /api/v4/:id/merge_requests` by a signed-in user (ID 123) in namespace `gitlab-com/gl-infra` (Premium plan). The rack middleware makes separate `check` calls for different rate limiting concerns, each with its own call site name and rules ordered most-specific-first: ```ruby ip_result = Labkit::RateLimiting.check( rate_limiter: "rack_request_ip", identifier: { request_type: "api", ip: "203.0.113.42", endpoint: "GET /api/v4/:id/merge_requests" }, rules: [ { name: "merge_requests_api", match: { endpoint: "GET /api/v4/:id/merge_requests" }, characteristics: [:ip], limit: 10, period: 1.minute, action: :log }, { name: "generic_api", match: { request_type: "api" }, characteristics: [:ip], limit: 2, period: 1.minute, action: :log } ] ) user_result = Labkit::RateLimiting.check( rate_limiter: "rack_request_user", identifier: { request_type: "api", user: 123, root_namespace_path: "gitlab-com/gl-infra", namespace_plan: "premium", endpoint: "GET /api/v4/:id/merge_requests" }, rules: [ { name: "merge_requests_api", match: { endpoint: "GET /api/v4/:id/merge_requests" }, characteristics: [:user], limit: 20, period: 1.minute, action: :log }, { name: "generic_api", match: { request_type: "api" }, characteristics: [:user], limit: 4, period: 1.minute, action: :log } ] ) ``` In each call, rules are evaluated in order — **first match wins**. For the per-IP check, the `merge_requests_api` rule matches first (specific endpoint), so the `generic_api` fallback is not evaluated. The result object contains `matched?`, `exceeded?`, `action`, and the matched `rule`. This mirrors how RackAttack works today — separate throttle definitions for per-IP and per-user limits. Each is a separate rate limiter with its own counters. The rules start in `action: :log` — counting and logging alongside the existing RackAttack enforcement. ##### ApplicationRateLimiter example Application-level rate limits have a different identifier shape. Consider `ApplicationRateLimiter.throttled?(:pipelines_create, scope: [user, project])` — a user creating a pipeline. The call site name is the action being rate limited, and the identifier carries the request context: ```ruby result = Labkit::RateLimiting.check( rate_limiter: "pipelines_create", identifier: { user: 42, project: 789, root_namespace_path: "gitlab-com/gl-infra", namespace_plan: "premium" }, rules: [ { name: "pipelines_create", characteristics: [:user, :project], limit: 25, period: 1.minute, action: :block } ] ) ``` Counter key: `pipelines_create:user:42:project:789` A simpler application-level limit like `user_sign_in`: ```ruby result = Labkit::RateLimiting.check( rate_limiter: "user_sign_in", identifier: { user: 42 }, rules: [ { name: "user_sign_in", characteristics: [:user], limit: 5, period: 10.minutes, action: :block } ] ) ``` Counter key: `user_sign_in:user:42` Note how the rack middleware and application rate limiters have **different identifier shapes** — the rack middleware includes `request_type`, `ip`, `endpoint`, while the application limiter includes `project`. The call site name (`rate_limiter`) is always prepended to counter keys, ensuring different call sites never share counters. The result object is the same in both cases: `result.matched?`, `result.exceeded?`, `result.action`, `result.rule`, `result.error?`, `result.resolved_limit`, `result.resolved_period`. ##### What the identifier enables in the future The identifier carries enough information to support future expression-based rule matching (where an external service determines the characteristics, limit, and action). For example, these are the kinds of rules the identifier design enables: | Rule expression (what it matches) | Characteristics (what to count by) | Threshold | |---|---|---| | `rate_limiter == "rack_request" AND endpoint == "GET /api/v4/:id/merge_requests"` | `[:ip]` (anonymous) | 10/min | | `rate_limiter == "rack_request" AND endpoint == "GET /api/v4/:id/merge_requests"` | `[:user]` (authenticated) | 20/min | | `rate_limiter == "rack_request" AND endpoint == "GET /api/v4/:id/merge_requests" AND namespace_plan == "premium"` | `[:user]` (authenticated + plan increase) | 30/min | | `rate_limiter == "rack_request" AND request_type == "api"` (generic API fallback) | `[:ip]` (anonymous) | 2/min | | `rate_limiter == "rack_request" AND request_type == "api"` (generic API fallback) | `[:user]` (authenticated) | 4/min | | `rate_limiter == "rack_request" AND request_type == "api" AND namespace_plan == "premium"` | `[:user]` (authenticated + plan increase) | 8/min | | `rate_limiter == "rack_request" AND request_type == "web"` (generic web fallback) | `[:ip]` (anonymous) | 4/min | | `rate_limiter == "rack_request" AND request_type == "web"` (generic web fallback) | `[:user]` (authenticated) | 8/min | | `rate_limiter == "rack_request" AND request_type == "web" AND namespace_plan == "premium"` | `[:user]` (authenticated + plan increase) | 16/min | | `rate_limiter == "pipelines_create"` (default) | `[:user, :project]` | 25/min | | `rate_limiter == "pipelines_create" AND namespace_plan == "premium"` | `[:user, :project]` | 50/min | | `rate_limiter == "pipelines_create" AND root_namespace_path == "gitlab-com/gl-infra"` | `[:user, :project]` | 100/min (custom override) | With expression matching, a request to `GET /api/v4/:id/merge_requests` by an authenticated user would automatically get the 20/min limit (specific endpoint match). A request to a different API endpoint without a custom rule would get the 4/min generic API limit (fallback). This hierarchical matching is a future capability — in this phase, the caller determines the correct limit. #### What is in scope for this iteration 1. **Test coverage** of existing rate limiting behavior (prerequisite — Stage 0) 2. **The labkit-ruby API**: identifier design, first-match-wins rule evaluation, counting/logging layer, minimal result object; response header state enrichment tracked separately in [#28785](https://gitlab.com/gitlab-com/gl-infra/production-engineering/-/work_items/28785) (Stage 1) 3. **Migrating ApplicationRateLimiter** to use the labkit API (Stage 2a, feature-flagged) 4. **New rack middleware** alongside RackAttack, using labkit, starting in log mode (Stage 2b, feature-flagged) 5. **Consistent response headers and default rate limits** for all endpoints (Stage 2c, 2d) #### What is explicitly out of scope (but designed for) - **Full expression-matching rule engine**: Labkit accepts simplified rules from the caller with basic key-value matching. The full expression engine (hierarchical fallback, complex predicates) and the external service that provides rules are future work. The rules structure in this phase is designed to be forward-compatible with both. - **Tiered rate limit enforcement**: The identifier includes `namespace_plan`, making tier-based differentiation possible in the future. This phase establishes the possibility; a future phase activates it. - **External service integration**: In a future phase, labkit will internally call an external service that performs expression matching against the identifier and returns the characteristics, limit, period, and action. This external service will take precedence over what the caller passes in. The identifier we design in this phase is the lookup key for that service. - **YAML schema**: The schema is a downstream artifact that should be designed _after_ the underlying API is proven. - **labkit-go**: Out of scope for this implementation plan. #### Configuration passthrough for backwards compatibility A key constraint is that **we cannot introduce breaking changes for self-managed installations**. Any configuration that existing GitLab installs have made (via ApplicationSettings, env vars, API) must continue to work. The approach: **configuration is passed at the call site**. The caller (rack middleware or application code) resolves the limit, period, characteristics, and action from existing sources (ApplicationSettings, env vars, hardcoded defaults) and passes them directly to labkit when checking a rate limit. Labkit does not own configuration loading — it receives everything it needs per call. This means: - Self-managed installs keep working: existing DB-backed settings are used by the caller to determine limits - GitLab.com keeps working: same mechanism, with the existing production values - No limits change unless explicitly reconfigured - In a future phase, labkit gains the ability to fetch config from an external service _internally_. The external service performs expression matching against the identifier and returns the characteristics, limit, period, and action — taking precedence over what the caller passes in. This is how we'll support per-customer and per-tier customization at scale, but it's out of scope now. ### Key principles - **Test coverage first.** Before any code changes touch rate limiting, we need comprehensive test coverage of existing behavior. This is the safety net that allows us to move fast without breaking things. - **Identifier-first design.** The rate limit identifier is the foundational contract that all other components (counting, configuration, logging, external service) depend on. Getting this right is the primary goal of this iteration. - **Backwards compatibility for self-managed.** Existing configuration methods (API, Application Settings UI, env vars) must continue to work. We expand the available mechanisms; we don't remove existing ones. - **Classify changes as "gitlab-com only" vs "affects self-managed."** Changes that affect self-managed customers carry higher risk and need more careful rollout planning. Breaking changes must go through the deprecation process. - **Consistent behavior regardless of limiter.** The action outcome (`:block`, `:log`, or `:allow`), response headers, and logging should behave the same way regardless of which rate limiting implementation is used under the hood. - **Sensible defaults for new endpoints.** Any new endpoint should get a reasonable default rate limit automatically, overridable as needed. - **Feature-flagged rollout.** Migration of existing limiters (RackAttack, ApplicationRateLimiter) to the new labkit API must be behind feature flags, allowing gradual rollout and safe rollback. --- ## Work Breakdown This iteration focuses on two repos: - `gitlab-org/labkit-ruby` — the unified rate limit API, identifier design, counting/logging layer - `gitlab-org/gitlab` — Rails monolith (RackAttack + ApplicationRateLimiter), consuming the labkit-ruby interface Future iterations will extend to: - `gitlab-com/kinds/rate-limits` — YAML schema - `gitlab-org/gitaly` — Gitaly adaptive rate limiting - Docs/design in `gitlab-com/content-sites/handbook` **labkit-go is out of scope for this implementation plan.** **Existing state:** - Issues #26761 (design doc) and #26780 (limits audit — 25 limits documented) are **closed** - Issue #26769 (rate-limits schema project) is **open**, 4/6 criteria done: project created, initialized, semantic-release configured. Still needed: schema published to Pages, first schema version released - Issues #26770 (monolith YAML support), #26772 (Helm chart), #26773 (migration), #26774 (docs) are **open and blocked** pending foundation work below --- ### Stage 0 — Test Coverage Baseline _Prerequisite for all subsequent stages. Nothing ships without this._ Before making any changes to rate limiting code — especially via agentic development — we need confidence that existing behavior is well-tested. An agent modifying rate limiting code without comprehensive test coverage is a recipe for silent regressions in a system that protects platform stability. **Steps:** - **0a.** Audit existing test coverage for `Gitlab::RackAttack` — identify which throttles have integration/request specs and which don't - **0b.** Audit existing test coverage for `Gitlab::ApplicationRateLimiter` — same analysis - **0c.** Write missing tests for critical rate limiting paths — focus on: throttle triggers correctly, dry run mode logs but doesn't block, bypass/allowlist skips throttling, response headers are correct - **0d.** Establish a coverage baseline metric — so we can track that subsequent changes don't reduce coverage **Verification:** Test suite passes. Coverage report shows all RackAttack throttles and ApplicationRateLimiter limits have at least one integration test covering the enforce path and the bypass/dry-run path. --- ### Stage 1 — labkit-ruby: Rate Limit API and Identifier Design _Prerequisite for Stage 2. The core deliverable of this iteration._ Implement a unified rate limit API in `gitlab-org/labkit-ruby` with a well-defined identifier and a counting/logging layer. This provides the interface that `gitlab-rails` and other labkit consumers adopt, establishing how rate limiting is identified, counted, and observed across GitLab. #### 1a. Call site name and request identifier Each rate limit check has two pieces of identity: **Call site name** (`rate_limiter`) — identifies *what is being rate limited*. This is the name of the rate limiting checkpoint. The rack middleware uses a single name like `"rack_request"`. Application-level limits use one name per action (e.g., `"create_project"`, `"user_sign_in"`). The call site name is provided as a **separate argument** to the API (not inside the identifier hash). It is always prepended to counter keys, ensuring different call sites never share counters. Rules are scoped to a call site name. **Request identifier** — describes *the context of the request*. This is a key-value object encoding who is making the request (user, IP), where it's going (endpoint, namespace), and what their entitlements are (plan). The identifier shape varies per call site. Together, the call site name and identifier are: - **Logged** on every request (structured logging, human and machine readable) - **Used to derive the Redis counting key** (call site name is prepended, then characteristics select which identifier keys form the rest of the key) - **The lookup key for a future external service** (must be stable and extensible) **Different call sites have different identifier shapes.** The rack middleware identifier includes dimensions like `request_type`, `ip`, `endpoint`. An application-level action like `create_project` includes `user`, `root_namespace_path`, `namespace_plan` instead. Rack middleware example — call site `"rack_request"`: ```ruby { request_type: "api", user: 123, ip: "203.0.113.42", root_namespace_path: "gitlab-com/gl-infra", namespace_plan: "premium", endpoint: "GET /api/v4/:id/merge_requests" } ``` Application rate limiter example — call site `"create_project"`: ```ruby { user: 42, root_namespace_path: "gitlab-com/gl-infra", namespace_plan: "premium" } ``` #### 1b. Call-site API In this phase, the caller passes the call site name, the identifier, and a set of rules: - **rate_limiter** — the call site name, always prepended to counter keys - **identifier** — the key-value object describing the request context - **rules** — array of rules, each with: - `name` — human-readable name for logging and response headers - `match` — hash of key-value pairs that must all be present in the identifier for the rule to apply - `characteristics` — which identifier keys to count by (the call site name is always implicitly prepended) - `limit` — the threshold (resolved by the caller from existing config) - `period` — the time window (resolved by the caller from existing config) - `action` — `:block` (enforce the limit) or `:log` (count and log but don't block). This is the rule's configured action; the result object's `action` field reflects the outcome (`:block`, `:log`, or `:allow`) ```ruby result = Labkit::RateLimiting.check( rate_limiter: "rack_request_user", identifier: { request_type: "api", user: 123, endpoint: "GET /api/v4/:id/merge_requests" }, rules: [ { name: "merge_requests_api", match: { endpoint: "GET /api/v4/:id/merge_requests" }, characteristics: [:user], limit: 20, period: 1.minute, action: :log }, { name: "generic_api", match: { request_type: "api" }, characteristics: [:user], limit: 4, period: 1.minute, action: :log } ] ) ``` Labkit evaluates rules in order — **first match wins**. The first rule whose `match` key-value pairs are all present in the identifier is the one that gets counted. The counter key is derived by prepending the call site name and then projecting the identifier onto the matched rule's characteristics. For example, call site `"rack_request_user"` with characteristics `[:user]` produces the counter key `rack_request_user:merge_requests_api:user:123`. The result object carries the outcome and the resolved values: ```ruby result.matched? # => true (a rule matched) result.exceeded? # => true/false (matched rule's count > limit) result.action # => :block, :log, or :allow (the outcome — what the caller should do) result.rule # => the matched Rule object (rule.action has the configured action) result.error? # => true if Redis was unavailable result.resolved_limit # => 20 (the resolved limit as Integer, nil when unmatched/error) result.resolved_period # => 60 (the resolved period as Integer, nil when unmatched/error) ``` When a rule matches but the count is within the limit, `result.action` returns `:allow` regardless of the rule's configured action. When the count exceeds the limit, `result.action` returns the rule's configured action (`:block` or `:log`). The caller only needs to check `result.action` to decide what to do: ```ruby case result.action when :block then render_429 when :log then # shadow mode: log that we would have blocked when :allow then # within limits, no match, or error — proceed end ``` Response header state (`remaining`, `reset_time`) will be added to the result object in a follow-up issue ([#28785](https://gitlab.com/gitlab-com/gl-infra/production-engineering/-/work_items/28785)). **Characteristic validation:** If a characteristic key is not present in the identifier, the behavior depends on the environment: - In **development and test**: raise an error. This catches misconfigured call sites early. - In **production**: log a warning and use a sentinel value (`_unknown_`) in the counter key. All requests missing that dimension share a counter. This ensures requests are never silently unrated. **Failure mode:** If Redis is unavailable for counting, labkit **fails open** — the request is allowed through. The result object indicates that counting failed (`result.error?` returns true) and `result.action` returns `:allow`. This ensures a Redis outage does not become an application outage. The failure is logged with the full call site name and identifier so it's visible in monitoring. This matches the current behavior of both RackAttack (which uses `Rack::Attack::StoreProxy` with silent failures) and `ApplicationRateLimiter` (which rescues Redis errors). #### 1c. Rate limit state for response headers The core API returns a result object with `matched?`, `exceeded?`, `action` (the outcome: `:block`, `:log`, or `:allow`), `rule`, `error?`, `resolved_limit`, and `resolved_period`. Response header state (`remaining`, `reset_time`, `limit`, `counter_key`) will be added to the result object in a follow-up issue ([#28785](https://gitlab.com/gitlab-com/gl-infra/production-engineering/-/work_items/28785)), providing the fields needed for `RateLimit-Limit`, `RateLimit-Remaining`, `RateLimit-Reset`, and `Retry-After` headers. Since labkit uses first-match-wins (one matched rule per check), the response header state is always for that single matched rule — there is no need for a `most_restrictive` selection across multiple results. #### 1d. Logging and observability Observability for the new rate limiting API is tracked in two separate issues: - **Prometheus metrics:** [#28798 — Add Prometheus metrics for labkit rate limit checks](https://gitlab.com/gitlab-com/gl-infra/production-engineering/-/work_items/28798). Covers counters for total checks and exceeded checks, plus gauges for configured limits and periods. Designed to provide equivalent observability to the existing `gitlab_rack_attack_events_total`, `gitlab_rack_attack_throttle_limit`, `gitlab_rack_attack_throttle_period_seconds`, and `gitlab_application_rate_limiter_throttle_utilization_ratio` metrics. - **Per-request structured logging:** [#28799 — Include rate limit state in existing per-request log messages](https://gitlab.com/gitlab-com/gl-infra/production-engineering/-/work_items/28799). Instead of emitting separate log messages (which would double log volume), rate limit information is added to the existing per-request structured log entries. Depends on #28785 (enriched result object). Key requirements: - The `action` field in logs and metrics reflects the outcome: `block` (exceeded and enforced), `log` (exceeded but only logged), or `allow` (within limits, no match, or error) - Logging enables comparing the new middleware's decisions against RackAttack's actual enforcement decisions during the parallel-run phase - Extensible for future identity types (GATE's `workload_identity`, `ambient_credential`) — these are just new keys in the identifier **Steps:** - Define the identifier structure and serialization format - Implement the call-site API (check with call site name, identifier, and rules array) - Implement first-match-wins rule evaluation (match conditions evaluated against identifier, first match counted) - Implement counting against Redis for the matched rule using characteristics-derived keys - Implement result object (`matched?`, `exceeded?`, `action` as outcome, `rule`, `error?`, `resolved_limit`, `resolved_period`) - Implement characteristic validation (error in dev/test, warning + sentinel in production) - Implement structured logging for the matched rule - Implement fail-open on Redis errors - Include rubocop rules and commit guidelines to avoid breaking downstream CI - Tag first release **Verification:** CI passes, gem installs cleanly, interface is consumable from a Rails project. Unit tests cover: identifier construction, first-match-wins rule evaluation (rule ordering, first match only counted, no-match case, empty rules), counter key derivation from characteristics, action outcome (`:block`, `:log`, `:allow`) behavior, result object fields including `resolved_limit`/`resolved_period`, characteristic validation (error in test, warning + sentinel in production), Redis fail-open behavior, structured log output. Integration test demonstrates a Rails app checking rate limits via labkit. --- ### Stage 2 — Migrate Application Rate Limiting to labkit _Rails consumes the labkit-ruby interface established in Stage 1. All work in this stage is feature-flagged._ #### 2a. Migrate ApplicationRateLimiter to labkit API (feature-flagged) **Repo:** `gitlab-org/gitlab` Currently, `ApplicationRateLimiter` has a `rate_limits` hash with hardcoded defaults and three counting strategies (`IncrementPerAction`, `IncrementPerActionedResource`, `IncrementResourceUsagePerAction`). Some limits have corresponding `ApplicationSetting` columns, some don't. This is a **clean switch** — behind a feature flag, replace the internal counting with labkit. Controllers and services continue to call `ApplicationRateLimiter.throttled?` as before. We do this first because the feature flag gives us a clean on/off switch per limit, making it safer to roll out incrementally. - Inside `ApplicationRateLimiter.throttled?`, construct the call site name (the limit name, e.g., `"pipelines_create"`) and the identifier (key-value object with scope objects serialized as key-value pairs) - Construct a single-element rules array with the `match`, `characteristics`, `limit`, `period`, and `action` derived from existing config - Resolve characteristics from the existing scope (e.g., `scope: [user, project]` becomes `characteristics: [:user, :project]`) - Resolve limit and period from the `rate_limits` hash and/or `ApplicationSetting` columns - Map existing dry-run behavior to `action: :log` - Pass everything to `Labkit::RateLimiting.check(rate_limiter:, identifier:, rules:)` and use the result - Ensure `ApplicationSetting`-backed limits continue to work as they do today - **Feature-flagged**: when on, use labkit for counting; when off, fall back to current behavior #### 2b. New rack rate-limiting middleware (log mode) **Repo:** `gitlab-org/gitlab` A new middleware runs **alongside** the existing RackAttack middleware. RackAttack continues to enforce as-is — this is not a replacement, it's a parallel run. The new middleware: - Constructs the identifier (key-value object with `request_type`, `ip`, `user`, `root_namespace_path`, `namespace_plan`, `endpoint`) from the Rack request - Translates existing RackAttack throttle definitions into rules (from `Gitlab::Throttle` options, `ApplicationSetting` DB columns, env vars), ordered most-specific-first for first-match-wins evaluation - Makes separate `check` calls for different rate limiting concerns (e.g., `rate_limiter: "rack_request_ip"` for per-IP limits, `rate_limiter: "rack_request_user"` for per-user limits), mirroring how RackAttack has separate throttle definitions for unauthenticated and authenticated requests - All rules start with `action: :log` — counting and logging but not blocking - Can compare its decisions against RackAttack's actual enforcement decisions (via the request env) to validate correctness before transitioning to enforcement - **Feature-flagged** for mounting (when flag is off, middleware is not loaded) - Eventually transitions rules to `action: :block` and RackAttack is retired The existing RackAttack middleware and its env var mechanisms (`GITLAB_THROTTLE_DRY_RUN`, `GITLAB_THROTTLE_USER_ALLOWLIST`, `GITLAB_THROTTLE_BYPASS_HEADER`) are unaffected. #### 2c. Consistent response headers - Depends on the result object being enriched with `remaining`, `reset_time`, `limit` in [#28785](https://gitlab.com/gitlab-com/gl-infra/production-engineering/-/work_items/28785) - Use the enriched result to generate consistent headers (`RateLimit-Limit`, `RateLimit-Remaining`, `RateLimit-Reset`, `Retry-After`) regardless of which limiter triggered them - Include the call site name and matched rule name in the response so users (and support) can identify which specific limit was hit - Align with the header format already implemented in `Gitlab::RackAttack::RequestThrottleData` #### 2d. Default rate limits for new endpoints - Implement middleware/interceptor that applies a sensible default rate limit to any endpoint that doesn't have an explicit rate limit call - The default should be overridable per-endpoint - This ensures new features ship with rate limiting from day one **Key constraint:** All changes in this stage must maintain backwards compatibility. Existing API endpoints, Application Settings columns, and env var mechanisms must continue to work. We are _expanding_ configuration options, not replacing them. Feature flags allow safe rollout and instant rollback. **Verification:** All existing rate limiting tests pass (Stage 0 baseline). New tests cover: ApplicationRateLimiter clean switch with single-rule array, identifier construction and rule translation in new rack middleware (with first-match-wins ordering), comparison of labkit decisions vs RackAttack decisions in log mode, default rate limit applied to unprotected endpoint, feature flag on/off behavior. --- ## Agent Workflow (applies to every issue) Each unit of work follows this loop: 1. **Read** — Agent reads issue + parent epic via MCP, surfaces spec gaps 2. **Spec** — Agent writes spec to the GitLab issue using the Spec Template below. Humans can redirect before any code is written **Issue handling rules (applied before writing the spec):** - **Existing issue:** (a) post the original description as a comment prefixed with `Original issue description (preserved before spec update):` so history is never lost; (b) replace the issue description with the full spec content using the Spec Template; (c) prepend `[Spec N]` to the issue title, where N is the sequential spec number for this stage (e.g. `[Spec 1] Create and initialise a project to house a YAML schema`). - **No existing issue:** create a new issue in `gitlab-com/gl-infra/production-engineering`, then apply the same three steps above. - **After either path:** link the issue as a child of epic #2021 (Phase 2: Agentic Implementation Plan) using the glab snippet below. This ensures every spec-driven issue is navigable from the source of truth. **glab: link issue as child of epic #2021** ```bash # Get the issue's WorkItem global ID CHILD_ID=$(glab api graphql -f query='query($iid: String!) { project(fullPath: "gitlab-com/gl-infra/production-engineering") { workItems(iid: $iid) { nodes { id } } } }' -f iid="<ISSUE_IID>" | python3 -c "import json,sys; print(json.load(sys.stdin)['data']['project']['workItems']['nodes'][0]['id'])") # Set epic #2021 as the parent (update the child, not the parent) glab api graphql -f query="mutation { workItemUpdate(input: { id: \"$CHILD_ID\" hierarchyWidget: { parentId: \"gid://gitlab/WorkItem/188522358\" } }) { workItem { id title } errors } }" ``` 3. **Adversarial Review** — A second agent reviews the spec against the Adversarial Checklist and posts findings tagged BLOCKER / CONCERN / PASS as a reply to the spec comment - If no BLOCKERs → proceed to step 4 - If BLOCKERs exist → go to step 3a 3a. **Resolve Blockers** — The implementing agent researches each BLOCKER (reads source code, checks referenced issues, verifies assumptions) and posts a spec update as a reply to the adversarial review discussion. Then loop back to step 3. Maximum 2 resolution rounds before escalating to human. 4. **Implement** — Agent writes code (Ruby, YAML, Go as needed) and opens MR to appropriate repo. At the start of this step: set the issue to `workflow-infra::In Dev` and assign it to the user who kicked off the agent. All migrations of existing limiters must be behind **feature flags**. 5. **Verify** — Agent runs test suite, posts evidence (test output, CI results) to issue/MR. No MR without passing evidence 6. **Adversarial Review (MR)** — Second agent reviews the MR diff against the spec: are all Given/When/Then scenarios covered by tests? Are there new edge cases introduced by the implementation? Posts findings to MR. Also check whether GitLab Duo has posted a review on the MR (it runs automatically on many MRs). If a Duo review is present, treat its findings as additional input: any Duo findings rated critical or high must be either fixed or explicitly accepted with rationale before requesting human review. Incorporate Duo findings into the adversarial review comment so the human reviewer sees everything in one place. - If BLOCKERs (from adversarial review or Duo review) → agent fixes and re-runs verification (step 5). Maximum 2 rounds before escalating to human 7. **Human Review** — Staff engineer reviews MR in GitLab, with adversarial findings visible 8. **Merge** GitLab issues are source of truth for specs. GitLab MRs are source of truth for verification. Every spec written by an agent must include measurable pass/fail criteria. **Escalation rule:** If the blocker resolution loop exceeds 2 rounds without clearing all BLOCKERs, the agent posts a summary of the unresolved blockers and the research done, then pauses for human input. Agents do not loop indefinitely. --- ## Spec Template Agents MUST use this template when writing a spec to a GitLab issue. All sections marked **[required]** must be present and non-empty. Omitting a required section is a blocker in adversarial review. ```markdown ## Spec ### Problem Statement [required]  ### Non-Goals [required]  ### Acceptance Criteria [required]  **Scenario 1: [name]** - Given: [precondition] - When: [action] - Then: [observable outcome]  ### Security Considerations [required]  ### Rollout & Backwards Compatibility [required]  ### Validation Loop / Verification Process [required]  ### Observability [optional]  ``` --- ## Adversarial Review Checklist The adversarial review agent posts a structured review as a comment. Each item must be explicitly addressed (Pass / Fail / Accepted Risk + reason): **Spec Review (Step 3):** - [ ] All required spec sections present and non-empty - [ ] Every acceptance criterion has a testable, observable outcome (not "should work" — what exactly is observed?) - [ ] At least one failure/edge case scenario is covered (empty config, malformed YAML, missing field, partial migration) - [ ] Security section explicitly addresses: bypass conditions, config injection, value exposure - [ ] Backwards compat covers all three deployment types (Self-Managed, Dedicated, Cells) - [ ] Verification process names specific files/commands — not "run the tests" - [ ] **Incident auditability:** Could an agent triage a rate limiting incident at 3am using only what this spec describes? Is the config location, log format, and expected behavior unambiguous? **MR Review (Step 6):** - [ ] Every Given/When/Then scenario in the spec has a corresponding test - [ ] Tests would catch a no-op implementation (do they actually assert behavior, not just that the method was called?) - [ ] No new untested code paths introduced - [ ] Config file changes are backwards compatible with the previous schema version - [ ] Log output is human and agent readable (structured, includes limit name + identifier + count) - [ ] Could a new SRE understand what this rate limit does from the YAML alone, without reading code? - [ ] GitLab Duo review checked (if present): any critical/high findings are either fixed or explicitly accepted with rationale in the adversarial review comment - [ ] Feature flag behavior tested: both enabled and disabled paths verified --- ## Verification Standards (All Stages) Every MR must include: - **Test coverage:** RSpec (Ruby), Go tests (Gitaly), YAML schema validation (JSON Schema) - **Evidence:** CI pipeline results, test output posted as comment to the issue - **No regression check:** existing rate limit count/behavior compared before/after - **Backwards compatibility:** self-managed fallback behavior tested explicitly - **Feature flag:** on/off behavior tested for all migration paths Agents may not close an issue or mark an MR ready for review without posting test evidence. --- ## Key Risks | Risk | Mitigation | |------|------------| | Insufficient test coverage of existing rate limiting code before changes | Stage 0 is a hard prerequisite — no code changes to rate limiting until coverage baseline is established | | Breaking changes for self-managed customers | Classify every change as "gitlab-com only" or "affects self-managed." Existing config methods always preserved as fallback. Feature flags on all migration paths. Deprecation process for any removals | | labkit-ruby gem versioning — Rails depends on a specific version; coordinating releases between labkit-ruby and gitlab-rails adds a step to the workflow | Keep labkit-ruby changes backwards compatible; Rails pins to a minimum version; gem update is part of Stage 2 issue acceptance criteria | | No Cloud Connector backend engineer (arrangement fell through) | Agent is primary implementer; monolith work requires GitLab maintainer review — identify maintainer reviewer before Stage 2 | | RackAttack upstream inactive | Avoid changes that require upstream merge; fork or monkey-patch if needed, document technical debt | | GATE auth flows introduce new identity types | Extensible identifier dimensions from day 1; CODEOWNERS enforcement | | Dedicated needs per-throttle log-only mode | Include in Stage 2 acceptance criteria via `action: :log`; blocks Dedicated epic #664 | | Identifier design not expressive enough for future external service | Designed as a structured key-value object with extensible dimensions from day 1. The identifier carries all dimensions needed for future expression-based matching without changing the core model | --- ## Source of Truth This plan lives as a GitLab epic, child of https://gitlab.com/groups/gitlab-com/gl-infra/-/work_items/1534, titled **"Phase 2: Agentic Implementation Plan"**. The GitLab epic is authoritative — local files are bootstrap artifacts only. All spec comments, adversarial review comments, and status updates are written directly to GitLab issues/epics via MCP. ## Keeping This Plan Up to Date This epic description is the authoritative plan. All changes to scope, workflow, or stage definitions must be written here first. Local files (e.g. Claude plan files) are discarded after bootstrapping. ### Reading the current plan **Via MCP** (in Claude Code sessions): ``` mcp__GitLab__get_workitem_notes url="https://gitlab.com/groups/gitlab-com/gl-infra/-/work_items/2021" ``` Use this to read status update comments and discussion. To read the description itself, use the GraphQL query below. **Via glab** (terminal or agent bash): ```bash glab api graphql -f query=' query { group(fullPath: "gitlab-com/gl-infra") { workItem(iid: "2021") { widgets { ... on WorkItemWidgetDescription { description } } } } }' | python3 -c " import json,sys d=json.load(sys.stdin) for w in d['data']['group']['workItem']['widgets']: if 'description' in w and w['description']: print(w['description']) " ``` ### Updating the plan description Always fetch the current description first, apply edits, then push the full updated body. Never do a partial overwrite without reading first. **Via glab**: ```bash glab api graphql -f query=' mutation($id: WorkItemID!, $desc: String!) { workItemUpdate(input: { id: $id descriptionWidget: { description: $desc } }) { workItem { webUrl } errors } }' \ -f id="gid://gitlab/WorkItem/188522358" \ -f desc="$UPDATED_DESCRIPTION" ``` The work item global ID is `gid://gitlab/WorkItem/188522358` (iid 2021 in gitlab-com/gl-infra). ### Adding status updates and notes Use MCP to post status updates, decisions, or adversarial review findings as comments — do not edit the description for these: **Via MCP**: ``` mcp__GitLab__create_workitem_note url="https://gitlab.com/groups/gitlab-com/gl-infra/-/work_items/2021" body="## Status Update: <date>\n..." ``` **Via glab**: ```bash glab api graphql -f query=' mutation($id: WorkItemID!, $body: String!) { createNote(input: { noteableId: $id, body: $body }) { note { id } errors } }' \ -f id="gid://gitlab/WorkItem/188522358" \ -f body="## Status Update: $(date +%Y-%m-%d)\n..." ``` ### Rule > **Never treat a local file as the plan.** If the GitLab epic description and any local copy diverge, the GitLab epic wins. Agents must fetch the current description at the start of any planning or update session.

epic