perf(compiler): close variable-length traversal cliff (!1069) · Merge requests · GitLab.org / orbit / GitLab Knowledge Graph

A 3-rel multi-hop count had two compiler issues: it silently undercounted paths by ~8% and ran 22 to 34% slower than necessary. Both surface in the optimizer's handling of variable-length UNION arms.

The cliff

Query: User -> AUTHORED -> MR -> IN_PROJECT -> Project <- CONTAINS x{1..3} <- Group, user pinned, aggregation count. The compiler emitted SQL with two compounding issues:

Wrong SIP placement. The to-side cascade IN-subquery was injected on e1.target_id for every UNION arm regardless of depth. At depth>1 e1.target_id is an intermediate hop, not the to-side, so depth>1 arms returned no rows and the count silently undercounted.
Static kind constraints stopped at the OUTER alias. inject_entity_kind_filters constrained hop_e<i>.source_kind and target_kind, but ClickHouse will not propagate those into per-arm scans. Each arm scanned every CONTAINS edge in the namespace instead of using the kind-led PK projections (by_rel_source_kind, by_rel_target_kind).

Numbers

3 runs each, this MR rebased on top of !1067 (merged). ClickHouse-reported metrics (query_duration_ms, read_rows, read_bytes, memory_usage) against a representative dataset.

Metric	Before (main HEAD)	After (this MR)	Change
Count	2043 (under)	2207 (correct)	+164 paths recovered
Wall	3.64-4.78 s	2.84-3.64 s	-22 to -34%
`query_duration_ms`	3175-4323 ms	2389-3207 ms	-25 to -36%
`read_rows`	6.27 M (under-scan)	13.20 M (correct work)	+110% (real paths)
`read_bytes`	214 MB	657 MB	+207% (correct work)
`memory_usage`	426-553 MB	682-722 MB	higher (driven by correct depth-2/3 inclusion)

What changed

All in optimize.rs.

Anchor parameter on inject_sip_for_aliases. Adds ArmAnchor::{First,Last}. The to-side cascade IN-subquery now lands on the chain's rightmost edge per arm (e<depth>.target_id outgoing, e<depth>.source_id incoming), matching the column the to-side node exposes. Restores correct counts.

New pass push_kind_literals_into_variable_length_arms. For each variable-length relationship whose endpoints have entity types, walks the corresponding hop_e<i> UNION ALL and injects static e1.<source_kind> = '<from>' and e<depth>.<target_kind> = '<to>' predicates into every arm. Static literals enable kind-led PK granule pruning; dynamic IN-subqueries cannot.

Suppress the per-arm to-side IN-subquery when both endpoints are kind-pinned. The arm-internal probe duplicates the outer node-table cascade SIP and was the dominant cost on top of an already kind-pruned scan. Pass order matters: the new pass runs immediately after inject_entity_kind_filters, so SIP injection knows whether arms already carry the kind constraint and can skip the redundant probe.

M1 input DSL

{
  "query_type": "aggregation",
  "nodes": [
    {"id": "u", "entity": "User", "node_ids": [116]},
    {"id": "mr", "entity": "MergeRequest"},
    {"id": "p", "entity": "Project"},
    {"id": "g", "entity": "Group"}
  ],
  "relationships": [
    {"type": "AUTHORED",   "from": "u",  "to": "mr"},
    {"type": "IN_PROJECT", "from": "mr", "to": "p"},
    {"type": "CONTAINS",   "from": "g",  "to": "p", "min_hops": 1, "max_hops": 3}
  ],
  "aggregations": [{"function": "count", "target": "u", "alias": "n"}],
  "limit": 3
}

Before: depth>1 arms get filtered to zero by `e1.target_id IN _cascade_p`

INNER JOIN (
  SELECT e1.target_id AS end_id, 1 AS depth FROM gl_edge AS e1
  WHERE rel_kind = 'CONTAINS' AND e1.target_id IN _cascade_p
  UNION ALL
  SELECT e2.target_id AS end_id, 2 AS depth FROM gl_edge AS e1
  INNER JOIN gl_edge AS e2 ON e1.target_id = e2.source_id AND e2.rel_kind = 'CONTAINS'
  WHERE rel_kind = 'CONTAINS'
    AND e1.target_id IN _cascade_p   -- intermediate, never matches Project IDs
  UNION ALL
  SELECT e3.target_id AS end_id, 3 AS depth ...
  WHERE rel_kind = 'CONTAINS'
    AND e1.target_id IN _cascade_p   -- intermediate, never matches Project IDs
) AS hop_e2 ...

After: per-arm kind literals; outer p.id IN _cascade_p JOIN provides the to-side filter

INNER JOIN (
  SELECT e1.target_id AS end_id, 1 AS depth FROM gl_edge AS e1
  WHERE e1.rel_kind = 'CONTAINS'
    AND e1.source_kind = 'Group' AND e1.target_kind = 'Project'
  UNION ALL
  SELECT e2.target_id AS end_id, 2 AS depth FROM gl_edge AS e1
  INNER JOIN gl_edge AS e2 ON e1.target_id = e2.source_id AND e2.rel_kind = 'CONTAINS'
  WHERE e1.rel_kind = 'CONTAINS'
    AND e1.source_kind = 'Group' AND e2.target_kind = 'Project'
  UNION ALL
  SELECT e3.target_id AS end_id, 3 AS depth ...
  WHERE e1.rel_kind = 'CONTAINS'
    AND e1.source_kind = 'Group' AND e3.target_kind = 'Project'
) AS hop_e2 ON hop_e2.end_id = p.id
INNER JOIN (SELECT * FROM gl_project AS p WHERE p.id IN _cascade_p ...) AS p ...

Tests

320 compiler unit tests pass (4 new): leftmost vs rightmost anchor walking, per-arm injection at all depths, Incoming direction column swap, skip-when-no-entity. An end-to-end test in lib.rs renders the M1 query through compile() and asserts each arm carries the expected static kind literals and that the redundant to-side IN-subquery is suppressed.

!1067 (merged) adds traversal hop frontier CTEs that narrow the from-side of intermediate edges in multi-hop traversals (Traversal queries only). This MR addresses the to-side correctness gap and the kind-pruning gap, and applies to Traversal and Aggregation. The two are complementary; no expected merge conflict.

Edited Apr 27, 2026 by Michael Angelo Rivera

perf(compiler): close variable-length traversal cliff

The cliff

Numbers

What changed

Tests

Related

Merge request reports