perf(compiler): skip node table join for edge-only count aggregations
What does this MR do and why?
When an aggregation's target node has no filters, no pinned node_ids, and no property-based aggregation (e.g. sum on a node column), the compiler now skips joining the target node table entirely and aggregates directly from the edge table. This eliminates reading millions of node rows that were only needed as a join key for COUNT.
Introduces a LoweringContext on Input that the lower() pass writes to and the optimize() pass reads from, keeping lowering metadata out of the AST.
How it works
Lowering (lower.rs):
-
agg_skippable_nodes()identifies aggregation target nodes that can be eliminated: no filters, nonode_ids, no property-based aggregations (e.g.sumon a node column). The node must only appear inCOUNTaggregations referencing the node itself (not a specific property). -
lower_aggregation_edge_only()emits a simplified query that reads onlygl_edgeand the group-by node table, skipping the target node table and its SIP CTE entirely. -
agg_expr_with_skip()rewritesCOUNT(node.id)toCOUNT(edge.source_id)orCOUNT(edge.target_id)depending on edge direction. - A
source_kind/target_kindfilter is added to the edge scan to ensure correctness — without it, all entity types sharing a relationship kind get counted together.
Optimization (optimize.rs):
-
apply_sip_prefilterchecksinput.lowering.skipped_node_joinsto skip building SIP sets for eliminated nodes that have no selectivity (no filters/node_ids). If the skipped node does have selectivity, SIP is preserved to narrow the edge scan.
LoweringContext (input.rs):
- New struct on
Inputwith#[serde(skip)]. Storesskipped_node_joins: HashSet<String>populated bylower()and read byoptimize().
Benchmarks
Benchmarked all 25 optimization showcase queries (30M edges). Compared main (with by_rel_source_kind + by_rel_target_kind projections) vs this branch.
Rows read — winners
| Query | Description | Old | New | Reduction |
|---|---|---|---|---|
| Q14 | Agg: MR count (no pinned IDs) | 1,610,043 | 33,475 | -98% |
| Q15 | Agg: jobs by status | 25,277,630 | 8,146,447 | -68% |
| Q18 | Agg: pipeline count (narrow NS) | 3,806,561 | 170,691 | -96% |
Peak memory — winners
| Query | Description | Old | New | Reduction |
|---|---|---|---|---|
| Q14 | Agg: MR count (no pinned IDs) | 74MB | 19MB | -74% |
| Q15 | Agg: jobs by status | 2,867MB | 57MB | -98% |
| Q18 | Agg: pipeline count | 125MB | 45MB | -64% |
Q15 is the biggest win — the by_rel_target_kind projection caused a regression on this query pattern (ClickHouse picked the wrong projection), but edge-only aggregation sidesteps the problem entirely by not joining the node table at all.
Non-aggregation queries
No regressions. Traversals, neighbors, path-finding, and search queries are unchanged — the optimization only triggers for qualifying aggregation queries.
Full benchmark (all 25 queries, on top of !661 (merged))
| Query | Description | Old rows | New rows | Δ rows | Old mem | New mem | Δ mem |
|---|---|---|---|---|---|---|---|
| Q01 | 3-node chain | 209,920 | 94,208 | same | 77MB | 51MB | same |
| Q02 | 4-node fan-out | 413,062 | 389,510 | same | 90MB | 30MB | same |
| Q03 | Neighbors both | 36,864 | 9,225 | same | 129MB | 24MB | same |
| Q04 | Neighbors outgoing | 7,875 | 1,731 | same | 45MB | 9MB | same |
| Q05 | Path depth 3 | 62,464 | 8,192 | same | 55MB | 20MB | same |
| Q06 | Path user->project | 3,173,376 | 3,156,992 | same | 71MB | 31MB | same |
| Q07 | Agg: count merged MRs | 2,739,831 | 2,466,943 | -10% | 86MB | 38MB | same |
| Q08 | Agg: sum pipe dur | 5,676,208 | 5,438,136 | same | 95MB | 48MB | same |
| Q09 | Agg: MR count pinned | 2,361,975 | 2,103,675 | -11% | 84MB | 43MB | same |
| Q10 | User->Group | 9,216 | 5,120 | same | 33MB | 9MB | same |
| Q11 | Pipe->Job | 2,235,465 | 2,188,361 | same | 113MB | 69MB | same |
| Q12 | All MRs | 76,800 | 89,088 | same | 107MB | 32MB | same |
| Q13 | MRs+Notes | 107,836 | 120,124 | same | 66MB | 33MB | same |
| Q14 | MR count no IDs | 1,610,043 | 33,475 | -98% | 74MB | 19MB | -74% |
| Q15 | Jobs by status | 25,277,630 | 8,146,447 | -68% | 2,867MB | 57MB | -98% |
| Q16 | Search pipes | 67,584 | 67,584 | same | 42MB | 42MB | same |
| Q17 | Narrow authored | 76,800 | 89,088 | same | 107MB | 32MB | same |
| Q18 | Pipeline count | 3,806,561 | 170,691 | -96% | 125MB | 45MB | -64% |
| Q19 | MR context | 1,463,612 | 634,172 | same | 101MB | 39MB | same |
| Q20 | Pipeline chain | 6,315,551 | 5,619,092 | same | 135MB | 55MB | same |
| Q21 | Agg: multi-filter | 2,739,831 | 2,466,943 | -10% | 86MB | 38MB | same |
| Q22 | Narrow 3-node | 1,446,204 | 662,844 | same | 101MB | 39MB | same |
| Q23 | Failed jobs | 40,041,878 | 40,026,828 | same | 456MB | 486MB | same |
| Q24 | All jobs | 24,576 | 24,576 | same | 37MB | 15MB | same |
| Q25 | Search MRs | 31,036 | 31,036 | same | 11MB | 11MB | same |