perf(compiler): skip node table join for edge-only count aggregations (!662) · Merge requests · GitLab.org / orbit / GitLab Knowledge Graph

What does this MR do and why?

When an aggregation's target node has no filters, no pinned node_ids, and no property-based aggregation (e.g. sum on a node column), the compiler now skips joining the target node table entirely and aggregates directly from the edge table. This eliminates reading millions of node rows that were only needed as a join key for COUNT.

Introduces a LoweringContext on Input that the lower() pass writes to and the optimize() pass reads from, keeping lowering metadata out of the AST.

How it works

Lowering (lower.rs):

agg_skippable_nodes() identifies aggregation target nodes that can be eliminated: no filters, no node_ids, no property-based aggregations (e.g. sum on a node column). The node must only appear in COUNT aggregations referencing the node itself (not a specific property).
lower_aggregation_edge_only() emits a simplified query that reads only gl_edge and the group-by node table, skipping the target node table and its SIP CTE entirely.
agg_expr_with_skip() rewrites COUNT(node.id) to COUNT(edge.source_id) or COUNT(edge.target_id) depending on edge direction.
A source_kind/target_kind filter is added to the edge scan to ensure correctness — without it, all entity types sharing a relationship kind get counted together.

Optimization (optimize.rs):

apply_sip_prefilter checks input.lowering.skipped_node_joins to skip building SIP sets for eliminated nodes that have no selectivity (no filters/node_ids). If the skipped node does have selectivity, SIP is preserved to narrow the edge scan.

LoweringContext (input.rs):

New struct on Input with #[serde(skip)]. Stores skipped_node_joins: HashSet<String> populated by lower() and read by optimize().

Benchmarks

Benchmarked all 25 optimization showcase queries (30M edges). Compared main (with by_rel_source_kind + by_rel_target_kind projections) vs this branch.

Rows read — winners

Query	Description	Old	New	Reduction
Q14	Agg: MR count (no pinned IDs)	1,610,043	33,475	-98%
Q15	Agg: jobs by status	25,277,630	8,146,447	-68%
Q18	Agg: pipeline count (narrow NS)	3,806,561	170,691	-96%

Peak memory — winners

Query	Description	Old	New	Reduction
Q14	Agg: MR count (no pinned IDs)	74MB	19MB	-74%
Q15	Agg: jobs by status	2,867MB	57MB	-98%
Q18	Agg: pipeline count	125MB	45MB	-64%

Q15 is the biggest win — the by_rel_target_kind projection caused a regression on this query pattern (ClickHouse picked the wrong projection), but edge-only aggregation sidesteps the problem entirely by not joining the node table at all.

Non-aggregation queries

No regressions. Traversals, neighbors, path-finding, and search queries are unchanged — the optimization only triggers for qualifying aggregation queries.

Full benchmark (all 25 queries, on top of !661 (merged))

Query	Description	Old rows	New rows	Δ rows	Old mem	New mem	Δ mem
Q01	3-node chain	209,920	94,208	same	77MB	51MB	same
Q02	4-node fan-out	413,062	389,510	same	90MB	30MB	same
Q03	Neighbors both	36,864	9,225	same	129MB	24MB	same
Q04	Neighbors outgoing	7,875	1,731	same	45MB	9MB	same
Q05	Path depth 3	62,464	8,192	same	55MB	20MB	same
Q06	Path user->project	3,173,376	3,156,992	same	71MB	31MB	same
Q07	Agg: count merged MRs	2,739,831	2,466,943	-10%	86MB	38MB	same
Q08	Agg: sum pipe dur	5,676,208	5,438,136	same	95MB	48MB	same
Q09	Agg: MR count pinned	2,361,975	2,103,675	-11%	84MB	43MB	same
Q10	User->Group	9,216	5,120	same	33MB	9MB	same
Q11	Pipe->Job	2,235,465	2,188,361	same	113MB	69MB	same
Q12	All MRs	76,800	89,088	same	107MB	32MB	same
Q13	MRs+Notes	107,836	120,124	same	66MB	33MB	same
Q14	MR count no IDs	1,610,043	33,475	-98%	74MB	19MB	-74%
Q15	Jobs by status	25,277,630	8,146,447	-68%	2,867MB	57MB	-98%
Q16	Search pipes	67,584	67,584	same	42MB	42MB	same
Q17	Narrow authored	76,800	89,088	same	107MB	32MB	same
Q18	Pipeline count	3,806,561	170,691	-96%	125MB	45MB	-64%
Q19	MR context	1,463,612	634,172	same	101MB	39MB	same
Q20	Pipeline chain	6,315,551	5,619,092	same	135MB	55MB	same
Q21	Agg: multi-filter	2,739,831	2,466,943	-10%	86MB	38MB	same
Q22	Narrow 3-node	1,446,204	662,844	same	101MB	39MB	same
Q23	Failed jobs	40,041,878	40,026,828	same	456MB	486MB	same
Q24	All jobs	24,576	24,576	same	37MB	15MB	same
Q25	Search MRs	31,036	31,036	same	11MB	11MB	same

Edited Mar 22, 2026 by Michael Usachenko

perf(compiler): skip node table join for edge-only count aggregations