Aggregation queries bypass per-entity authorization for target nodes
## Summary Aggregation queries in the Knowledge Graph query engine do not enforce per-entity authorization (redaction) on the target node — only on the `group_by` node. This allows a user with `read_project` but without `read_vulnerability` to extract full vulnerability details (count, IDs, severity, state, report type, title, timestamps) for any project within their namespace scope. ## Reproduction Attached: [aggregation_inference.py](/uploads/73ebfca273daaf483549a146b09f208b/aggregation_inference.py) ```bash python3 aggregation_inference.py \ --gitlab-url http://gdk.test:8080 \ --token <PAT_WITH_REPORTER_ROLE_ONLY> \ --project-id SOMEPROJECCT ``` The script extracts vulnerability details using only aggregation queries. No `read_vulnerability` ability is required or checked. Example output: ``` [*] Phase 1: Counting vulnerabilities via aggregation... Vulnerabilities found: 1 [*] Phase 2: Extracting severity distribution... critical: 1 [*] Phase 5: Extracting individual vulnerability IDs... Found 1 vulnerability IDs: [399] [*] Phase 6: Extracting details for 1 vulnerabilities... --- Vulnerability ID: 399 --- severity: critical state: detected report_type: generic title: 'test vuln' created_at: 2026-03-24 ``` ## Root Cause ### 1. Compiler only adds redaction columns for `group_by` nodes In `crates/query-engine/compiler/src/enforce.rs` (lines 133-143), the `selectable_nodes` set for aggregation queries is built exclusively from `group_by` aliases: ```rust let selectable_nodes: HashSet<&str> = match input.query_type { QueryType::Aggregation => input .aggregations .iter() .filter_map(|agg| agg.group_by.as_deref()) .collect(), // ... }; ``` Later (lines 193-198), nodes not in `selectable_nodes` are skipped — no `_gkg_{alias}_id` or `_gkg_{alias}_type` columns are added for the target node. This is explicitly tested at line 554 (`aggregation_only_adds_columns_for_group_by_nodes`). ### 2. `resource_checks()` never collects target node IDs In `crates/query-engine/types/src/query_result.rs` (line 263), `resource_checks()` iterates over `self.ctx.nodes()`. Since the target node was never added to the `ResultContext`, its IDs are never collected for the Rails authorization exchange. ### 3. Authorization stage skips unchecked entity types In the authorization stage, when `resource_checks()` returns no checks for the Vulnerability type, no `read_vulnerability` call is made to Rails. The rows pass through with only the `group_by` node (Project) being authorized. ### 4. Target node filters ARE applied in SQL Meanwhile, the target node's filters from the query JSON are applied in the SQL `WHERE` clause via `build_full_where()` in `lower.rs`. This creates a 1-bit oracle: the attacker can test any filter condition on any target node column and observe whether the count is 0 or 1. ## Attack Technique The attacker uses aggregation queries with `group_by` on a node they CAN access (Project) and `count` on a node they CANNOT access (Vulnerability): ```json { "query_type": "aggregation", "nodes": [ {"id": "p", "entity": "Project", "node_ids": [40]}, {"id": "v", "entity": "Vulnerability", "filters": {"severity": "critical"}} ], "relationships": [{"type": "IN_PROJECT", "from": "v", "to": "p"}], "aggregations": [{"function": "count", "target": "v", "alias": "c", "group_by": "p"}] } ``` The Project node is authorized (redaction checked), but the Vulnerability node is not. The filter `{"severity": "critical"}` is applied in the SQL WHERE clause and the resulting count (0 or 1) leaks whether the condition matched. By iterating through possible values, the attacker extracts: | Field | Technique | Queries | |-------|-----------|---------| | Count | Single query | 1 | | IDs | Binary search on `id` with `lte`/`lt` | O(N log max_id) | | Enum fields (severity, state, report_type) | Test each value | O(values) | | Boolean fields | Test true/false | 2 | | String fields (title) | Character-by-character with `starts_with` | O(len * charset) | | Timestamps | Binary search on year/month/day with `gte`/`lte` | O(log range) | ## Affected Entity Pairs Any entity pair where the `group_by` node has a lower privilege requirement than the `count` target: - `read_project` without `read_vulnerability` → leak vulnerability details - `read_project` without `read_security_resource` → leak security findings - `read_merge_request` without `read_code` → leak file paths, definitions - `read_project` without `read_build` → leak CI pipeline/job details ## Mitigation Options 1. **Add redaction for target nodes in aggregation queries.** The compiler should add `_gkg_{alias}_id` and `_gkg_{alias}_type` for target nodes, and `resource_checks()` should collect their IDs. Rows where the user lacks the target entity's ability should return count 0. 2. **Validate abilities at query time.** Before compiling, check that the requesting user has the required ability for ALL entity types in the query, not just the `group_by` entities. 3. **Restrict filters on unauthorized entities.** If the user doesn't have the target entity's ability, reject queries with filters on that entity's columns (prevents the oracle). 4. **Rate-limit aggregation queries.** As a defense-in-depth measure, limit the number of aggregation queries per user per time window to make the binary search extraction impractical. ## Attachments [aggregation_inference.py](/uploads/73ebfca273daaf483549a146b09f208b/aggregation_inference.py) proof of concept script
issue