perf(compiler): collapse traversal paths via trie subsumption
What does this MR do and why?
Relates to #320 (closed), #402 (closed). Replaces flat N-way OR of startsWith() clauses with a PathTrie that eliminates redundant paths, including subsumption: 1/100/ + 1/100/200/ → just 1/100/.
Note on Security: The check_ast post-check validates that every emitted startsWith literal is a prefix of at least one SecurityContext path. Collapsed prefixes (e.g., 1/100/ replacing 1/100/200/) pass this check by construction — 1/100/200/ starts with 1/100/. Verified by existing accepts_lowest_common_prefix test.
Problem
The security pass injects one startsWith(traversal_path, ...) clause per authorized namespace. A user with access to 38 groups gets:
startsWith(tp, '1/') -- global LCP (weak)
AND (
startsWith(tp, '1/100/') OR
startsWith(tp, '1/100/200/') OR -- redundant: covered by 1/100/
startsWith(tp, '1/100/201/') OR -- redundant: covered by 1/100/
... 38 total
)ClickHouse collapses the 38-way OR into a generic exclusion search with range ['1/', '10/'), effectively scanning the entire org prefix. This was identified as root cause P2 in the query timeout investigation: production queries against v17_gl_definition (45M rows) showed 144M rows read and 35.7s execution despite the PK starting with traversal_path.
The specific failing queries:
Q4: 504 Gateway Timeout, 35.7s, 144.1M rows, 2.70 GiB memory
{
"query_type": "traversal",
"nodes": [
{"id": "f", "entity": "File",
"filters": {"path": {"op": "ends_with", "value": "compiler/src/lib.rs"}}},
{"id": "d", "entity": "Definition",
"filters": {"name": {"op": "eq", "value": "compile"}}}
],
"relationships": [{"type": "DEFINES", "from": "f", "to": "d"}]
}Q3: 504 Gateway Timeout, 37.6s, 366.4M rows, 2.87 GiB memory
{
"query_type": "aggregation",
"nodes": [
{"id": "f", "entity": "File",
"filters": {"path": {"op": "starts_with", "value": "crates/"}}},
{"id": "caller", "entity": "Definition"},
{"id": "compile", "entity": "Definition",
"filters": {"name": {"op": "eq", "value": "compile"}}}
],
"relationships": [
{"type": "DEFINES", "from": "f", "to": "caller"},
{"type": "CALLS", "from": "caller", "to": "compile"}
],
"aggregations": [{"function": "count", "target": "compile", "group_by": "caller"}]
}EXPLAIN indexes=1 showed Granules: 21665/21943 (99% scan) on the definition table — the 38-way OR on traversal_path defeated PK-based granule pruning.
Fix
PathTrie is a segment-level trie (BTreeMap<String, PathTrie> + terminal: bool). Paths are inserted by splitting on /. Walking the trie emits only terminal nodes, pruning all descendants:
Insert: 1/100/, 1/100/200/, 1/100/201/, 1/200/
Trie:
1 → 100 [terminal] → 200, 201 (pruned)
→ 200 [terminal]
Output: ["1/100/", "1/200/"] (2 clauses instead of 4)For a user with a parent group 1/10/ plus 30 subgroups, this collapses 31 OR clauses to 1. Fewer OR clauses give ClickHouse a tighter key condition range, enabling actual granule pruning instead of the degenerate ['1/', '10/') fallback.