feat(indexer): add traversal_path to gl_edge table
What does this MR do and why?
The gl_edge table currently lacks traversal_path, unlike all node tables. This forces ClickHouse to scan the full edge table during traversal queries, then filter results only after joining to security-filtered node tables. This MR adds traversal_path as the first sort key column on gl_edge, enabling ClickHouse to prune edge data early using prefix filtering -- reducing I/O before joins happen.
The edge's traversal_path comes from the parent row that produces the edge (e.g. an MR's author_id edge uses the MR's traversal_path). Global entities like User default to '0/'.
What changed
-
Schema (
fixtures/schema/graph.sql): newtraversal_path String DEFAULT '0/'column, ORDER BY and PRIMARY KEY updated to start with it -
Ontology (
crates/ontology/src/lib.rs): added toEDGE_RESERVED_COLUMNS -
SDLC indexer (
prepare.rs,transform.rs):PreparedEdgeandPreparedEdgeEtlcarry anamespacedflag; all three edge SQL builders emittraversal_pathfrom source data when namespaced,'0/'when global -
Code indexer (
arrow_converter.rs): edge Arrow batches now includetraversal_pathfrom the project -
Query engine (
lower.rs):EDGE_ALIAS_SUFFIXESincludes"path"for the new column -
Security filtering (
security.rs):gl_edgeis no longer skipped byshould_apply_security_filter(), sostartsWithprefix filters are injected on edge table scans -
Simulator:
EdgeRecordcarriestraversal_path, schema/parquet writers updated, default ORDER BY starts withtraversal_path
Test coverage
Every integration test that creates edges now validates the correct traversal_path was written via assert_edges_have_traversal_path (count + path in one check) and assert_edge_count_for_traversal_path (for edges that span multiple namespaces).
Covered: groups, projects, merge requests, notes, milestones, MR diffs, work items, labels, CI pipelines/stages/jobs, all security entities, and code indexing.
Related Issues
Resolves #145 (closed)
Testing
Integration and unit tests
Performance Analysis
- This merge request does not introduce any performance regression. If a performance regression is expected, explain why.