perf(compiler): collapse traversal paths via trie subsumption

What does this MR do and why?

Relates to #320 (closed), #402 (closed). Replaces flat N-way OR of startsWith() clauses with a PathTrie that eliminates redundant paths, including subsumption: 1/100/ + 1/100/200/ → just 1/100/.

Note on Security: The check_ast post-check validates that every emitted startsWith literal is a prefix of at least one SecurityContext path. Collapsed prefixes (e.g., 1/100/ replacing 1/100/200/) pass this check by construction — 1/100/200/ starts with 1/100/. Verified by existing accepts_lowest_common_prefix test.


Problem

The security pass injects one startsWith(traversal_path, ...) clause per authorized namespace. A user with access to 38 groups gets:

startsWith(tp, '1/')  -- global LCP (weak)
AND (
    startsWith(tp, '1/100/') OR
    startsWith(tp, '1/100/200/') OR  -- redundant: covered by 1/100/
    startsWith(tp, '1/100/201/') OR  -- redundant: covered by 1/100/
    ... 38 total
)

ClickHouse collapses the 38-way OR into a generic exclusion search with range ['1/', '10/'), effectively scanning the entire org prefix. This was identified as root cause P2 in the query timeout investigation: production queries against v17_gl_definition (45M rows) showed 144M rows read and 35.7s execution despite the PK starting with traversal_path.

The specific failing queries:

Q4: 504 Gateway Timeout, 35.7s, 144.1M rows, 2.70 GiB memory

{
  "query_type": "traversal",
  "nodes": [
    {"id": "f", "entity": "File",
     "filters": {"path": {"op": "ends_with", "value": "compiler/src/lib.rs"}}},
    {"id": "d", "entity": "Definition",
     "filters": {"name": {"op": "eq", "value": "compile"}}}
  ],
  "relationships": [{"type": "DEFINES", "from": "f", "to": "d"}]
}

Q3: 504 Gateway Timeout, 37.6s, 366.4M rows, 2.87 GiB memory

{
  "query_type": "aggregation",
  "nodes": [
    {"id": "f", "entity": "File",
     "filters": {"path": {"op": "starts_with", "value": "crates/"}}},
    {"id": "caller", "entity": "Definition"},
    {"id": "compile", "entity": "Definition",
     "filters": {"name": {"op": "eq", "value": "compile"}}}
  ],
  "relationships": [
    {"type": "DEFINES", "from": "f", "to": "caller"},
    {"type": "CALLS", "from": "caller", "to": "compile"}
  ],
  "aggregations": [{"function": "count", "target": "compile", "group_by": "caller"}]
}

EXPLAIN indexes=1 showed Granules: 21665/21943 (99% scan) on the definition table — the 38-way OR on traversal_path defeated PK-based granule pruning.

Fix

PathTrie is a segment-level trie (BTreeMap<String, PathTrie> + terminal: bool). Paths are inserted by splitting on /. Walking the trie emits only terminal nodes, pruning all descendants:

Insert: 1/100/, 1/100/200/, 1/100/201/, 1/200/
Trie:
  1 → 100 [terminal] → 200, 201 (pruned)
     → 200 [terminal]
Output: ["1/100/", "1/200/"]  (2 clauses instead of 4)

For a user with a parent group 1/10/ plus 30 subgroups, this collapses 31 OR clauses to 1. Fewer OR clauses give ClickHouse a tighter key condition range, enabling actual granule pruning instead of the degenerate ['1/', '10/') fallback.

Edited by Michael Usachenko

Merge request reports

Loading