perf(schema): rename by_rel to by_rel_source_kind and add by_rel_target_kind projection
What does this MR do and why?
Renames the existing by_rel projection to by_rel_source_kind and adds a new by_rel_target_kind projection with PK (relationship_kind, target_kind, target_id, source_id, traversal_path, source_kind). This lets ClickHouse prune granules on relationship type + target entity type via PK binary search, complementing the existing by_rel_source_kind projection which prunes on source entity type.
Follows up on !660 (merged) which added source_kind to the by_rel projection.
How it works
Queries that filter edges by relationship_kind and target_kind (reverse traversals, aggregations grouping by target) previously had to scan all edges for a relationship type and filter target_kind post-scan. With target_kind as the second PK column in by_rel_target_kind, ClickHouse binary-searches directly to the target entity slice.
For example, Q02 (4-node fan-out) traverses IN_PROJECT edges targeting a specific project. The old by_rel_source_kind projection reads all source kinds for IN_PROJECT and filters post-scan. The new by_rel_target_kind projection binary-searches to just the Project target slice.
gl_edge now has 4 projections:
-
by_source—(source_id, relationship_kind, ...) -
by_target—(target_id, relationship_kind, ...) -
by_rel_source_kind—(relationship_kind, source_kind, source_id, ...)(renamed fromby_rel) -
by_rel_target_kind—(relationship_kind, target_kind, target_id, ...)
Benchmarks
Benchmarked all 25 optimization showcase queries (30M edges). Compared old by_rel (relationship_kind, source_kind, source_id, ...) vs new layout with both by_rel_source_kind and by_rel_target_kind. ClickHouse automatically selects the best projection per query.
Rows read — winners
| Query | Description | Old | New | Reduction |
|---|---|---|---|---|
| Q02 | 4-node fan-out (pinned project) | 932,230 | 413,062 | -56% |
| Q11 | Project->Pipeline->Job | 8,194,143 | 2,235,465 | -73% |
| Q14 | Agg: MR count (no pinned IDs) | 9,009,601 | 1,610,043 | -82% |
| Q18 | Agg: pipeline count (narrow NS) | 9,009,601 | 3,806,561 | -58% |
Latency — winners
| Query | Description | Old | New | Improvement |
|---|---|---|---|---|
| Q01 | 3-node chain | 301ms | 157ms | -48% |
| Q11 | Project->Pipeline->Job | 546ms | 376ms | -31% |
| Q24 | All jobs traversal | 35ms | 21ms | -40% |
Peak memory — winners
| Query | Description | Old | New | Reduction |
|---|---|---|---|---|
| Q14 | Agg: MR count (no pinned IDs) | 126MB | 74MB | -41% |
Regressions
Q15 (jobs by status aggregation) regresses significantly — ClickHouse picks by_rel_target_kind but the query pattern doesn't benefit from it. This query is better served by the edge-only aggregation optimization via !662 (merged).
| Query | Description | Old rows | New rows | Δ rows | Old ms | New ms | Δ ms |
|---|---|---|---|---|---|---|---|
| Q15 | Agg: jobs by status | 9,009,601 | 25,277,630 | +181% | 828 | 5,877 | +610% |
Full benchmark (all 25 queries)
| Query | Description | Old rows | New rows | Δ rows | Old ms | New ms | Δ ms | Old mem | New mem | Δ mem |
|---|---|---|---|---|---|---|---|---|---|---|
| Q01 | 3-node chain | 136,192 | 209,920 | +54% | 301 | 157 | -48% | 95MB | 77MB | -19% |
| Q02 | 4-node fan-out | 932,230 | 413,062 | -56% | 275 | 281 | same | 82MB | 90MB | same |
| Q03 | Neighbors both | 32,768 | 36,864 | same | 144 | 160 | same | 98MB | 129MB | same |
| Q04 | Neighbors outgoing | 5,827 | 7,875 | same | 26 | 24 | same | 33MB | 45MB | same |
| Q05 | Path depth 3 | 47,104 | 62,464 | same | 153 | 243 | same | 54MB | 55MB | same |
| Q06 | Path user->project | 3,168,946 | 3,173,376 | same | 169 | 210 | same | 66MB | 71MB | same |
| Q07 | Agg: count merged MRs | 2,517,607 | 2,739,831 | +9% | 373 | 480 | +29% | 74MB | 86MB | +16% |
| Q08 | Agg: sum pipeline dur | 5,482,656 | 5,676,208 | +4% | 674 | 688 | same | 79MB | 95MB | +20% |
| Q09 | Agg: MR count (pinned) | 2,141,295 | 2,361,975 | +10% | 336 | 409 | +22% | 76MB | 84MB | same |
| Q10 | User->Group | 7,168 | 9,216 | same | 22 | 26 | same | 21MB | 33MB | same |
| Q11 | Pipe->Job | 8,194,143 | 2,235,465 | -73% | 546 | 376 | -31% | 99MB | 113MB | same |
| Q12 | All MRs | 76,800 | 76,800 | same | 43 | 38 | same | 95MB | 107MB | same |
| Q13 | MRs+Notes | 107,836 | 107,836 | same | 40 | 46 | same | 64MB | 66MB | same |
| Q14 | MR count no IDs | 9,009,601 | 1,610,043 | -82% | 189 | 261 | +38% | 126MB | 74MB | -41% |
| Q15 | Jobs by status | 9,009,601 | 25,277,630 | +181% | 828 | 5,877 | +610% | 154MB | 2,867MB | +1762% |
| Q16 | Search pipes | 67,584 | 67,584 | same | 8 | 9 | same | 42MB | 42MB | same |
| Q17 | Narrow authored | 76,800 | 76,800 | same | 30 | 37 | same | 95MB | 107MB | same |
| Q18 | Pipeline count | 9,009,601 | 3,806,561 | -58% | 298 | 444 | +49% | 126MB | 125MB | same |
| Q19 | MR context | 983,860 | 1,463,612 | +49% | 169 | 275 | +63% | 90MB | 101MB | same |
| Q20 | Pipeline chain | 6,272,032 | 6,315,551 | same | 395 | 514 | +30% | 118MB | 135MB | same |
| Q21 | Agg: multi-filter | 2,517,607 | 2,739,831 | +9% | 255 | 349 | +37% | 74MB | 86MB | same |
| Q22 | Narrow 3-node | 1,031,988 | 1,446,204 | +40% | 174 | 232 | +33% | 90MB | 101MB | same |
| Q23 | Failed jobs | 40,055,848 | 40,041,878 | same | 2,067 | 2,444 | +18% | 481MB | 456MB | same |
| Q24 | All jobs | 25,600 | 24,576 | same | 35 | 21 | -40% | 43MB | 37MB | same |
| Q25 | Search MRs | 31,036 | 31,036 | same | 9 | 10 | same | 11MB | 11MB | same |