perf(compiler): skip node table join for edge-only count aggregations

What does this MR do and why?

When an aggregation's target node has no filters, no pinned node_ids, and no property-based aggregation (e.g. sum on a node column), the compiler now skips joining the target node table entirely and aggregates directly from the edge table. This eliminates reading millions of node rows that were only needed as a join key for COUNT.

Introduces a LoweringContext on Input that the lower() pass writes to and the optimize() pass reads from, keeping lowering metadata out of the AST.

How it works

Lowering (lower.rs):

  • agg_skippable_nodes() identifies aggregation target nodes that can be eliminated: no filters, no node_ids, no property-based aggregations (e.g. sum on a node column). The node must only appear in COUNT aggregations referencing the node itself (not a specific property).
  • lower_aggregation_edge_only() emits a simplified query that reads only gl_edge and the group-by node table, skipping the target node table and its SIP CTE entirely.
  • agg_expr_with_skip() rewrites COUNT(node.id) to COUNT(edge.source_id) or COUNT(edge.target_id) depending on edge direction.
  • A source_kind/target_kind filter is added to the edge scan to ensure correctness — without it, all entity types sharing a relationship kind get counted together.

Optimization (optimize.rs):

  • apply_sip_prefilter checks input.lowering.skipped_node_joins to skip building SIP sets for eliminated nodes that have no selectivity (no filters/node_ids). If the skipped node does have selectivity, SIP is preserved to narrow the edge scan.

LoweringContext (input.rs):

  • New struct on Input with #[serde(skip)]. Stores skipped_node_joins: HashSet<String> populated by lower() and read by optimize().

Benchmarks

Benchmarked all 25 optimization showcase queries (30M edges). Compared main (with by_rel_source_kind + by_rel_target_kind projections) vs this branch.

Rows read — winners

Query Description Old New Reduction
Q14 Agg: MR count (no pinned IDs) 1,610,043 33,475 -98%
Q15 Agg: jobs by status 25,277,630 8,146,447 -68%
Q18 Agg: pipeline count (narrow NS) 3,806,561 170,691 -96%

Peak memory — winners

Query Description Old New Reduction
Q14 Agg: MR count (no pinned IDs) 74MB 19MB -74%
Q15 Agg: jobs by status 2,867MB 57MB -98%
Q18 Agg: pipeline count 125MB 45MB -64%

Q15 is the biggest win — the by_rel_target_kind projection caused a regression on this query pattern (ClickHouse picked the wrong projection), but edge-only aggregation sidesteps the problem entirely by not joining the node table at all.

Non-aggregation queries

No regressions. Traversals, neighbors, path-finding, and search queries are unchanged — the optimization only triggers for qualifying aggregation queries.

Full benchmark (all 25 queries, on top of !661 (merged))

Query Description Old rows New rows Δ rows Old mem New mem Δ mem
Q01 3-node chain 209,920 94,208 same 77MB 51MB same
Q02 4-node fan-out 413,062 389,510 same 90MB 30MB same
Q03 Neighbors both 36,864 9,225 same 129MB 24MB same
Q04 Neighbors outgoing 7,875 1,731 same 45MB 9MB same
Q05 Path depth 3 62,464 8,192 same 55MB 20MB same
Q06 Path user->project 3,173,376 3,156,992 same 71MB 31MB same
Q07 Agg: count merged MRs 2,739,831 2,466,943 -10% 86MB 38MB same
Q08 Agg: sum pipe dur 5,676,208 5,438,136 same 95MB 48MB same
Q09 Agg: MR count pinned 2,361,975 2,103,675 -11% 84MB 43MB same
Q10 User->Group 9,216 5,120 same 33MB 9MB same
Q11 Pipe->Job 2,235,465 2,188,361 same 113MB 69MB same
Q12 All MRs 76,800 89,088 same 107MB 32MB same
Q13 MRs+Notes 107,836 120,124 same 66MB 33MB same
Q14 MR count no IDs 1,610,043 33,475 -98% 74MB 19MB -74%
Q15 Jobs by status 25,277,630 8,146,447 -68% 2,867MB 57MB -98%
Q16 Search pipes 67,584 67,584 same 42MB 42MB same
Q17 Narrow authored 76,800 89,088 same 107MB 32MB same
Q18 Pipeline count 3,806,561 170,691 -96% 125MB 45MB -64%
Q19 MR context 1,463,612 634,172 same 101MB 39MB same
Q20 Pipeline chain 6,315,551 5,619,092 same 135MB 55MB same
Q21 Agg: multi-filter 2,739,831 2,466,943 -10% 86MB 38MB same
Q22 Narrow 3-node 1,446,204 662,844 same 101MB 39MB same
Q23 Failed jobs 40,041,878 40,026,828 same 456MB 486MB same
Q24 All jobs 24,576 24,576 same 37MB 15MB same
Q25 Search MRs 31,036 31,036 same 11MB 11MB same
Edited by Michael Usachenko

Merge request reports

Loading