feat(indexer): add traversal_path to gl_edge table

What does this MR do and why?

The gl_edge table currently lacks traversal_path, unlike all node tables. This forces ClickHouse to scan the full edge table during traversal queries, then filter results only after joining to security-filtered node tables. This MR adds traversal_path as the first sort key column on gl_edge, enabling ClickHouse to prune edge data early using prefix filtering -- reducing I/O before joins happen.

The edge's traversal_path comes from the parent row that produces the edge (e.g. an MR's author_id edge uses the MR's traversal_path). Global entities like User default to '0/'.

What changed

  • Schema (fixtures/schema/graph.sql): new traversal_path String DEFAULT '0/' column, ORDER BY and PRIMARY KEY updated to start with it
  • Ontology (crates/ontology/src/lib.rs): added to EDGE_RESERVED_COLUMNS
  • SDLC indexer (prepare.rs, transform.rs): PreparedEdge and PreparedEdgeEtl carry a namespaced flag; all three edge SQL builders emit traversal_path from source data when namespaced, '0/' when global
  • Code indexer (arrow_converter.rs): edge Arrow batches now include traversal_path from the project
  • Query engine (lower.rs): EDGE_ALIAS_SUFFIXES includes "path" for the new column
  • Security filtering (security.rs): gl_edge is no longer skipped by should_apply_security_filter(), so startsWith prefix filters are injected on edge table scans
  • Simulator: EdgeRecord carries traversal_path, schema/parquet writers updated, default ORDER BY starts with traversal_path

Test coverage

Every integration test that creates edges now validates the correct traversal_path was written via assert_edges_have_traversal_path (count + path in one check) and assert_edge_count_for_traversal_path (for edges that span multiple namespaces).

Covered: groups, projects, merge requests, notes, milestones, MR diffs, work items, labels, CI pipelines/stages/jobs, all security entities, and code indexing.

Resolves #145 (closed)

Testing

Integration and unit tests

Performance Analysis

  • This merge request does not introduce any performance regression. If a performance regression is expected, explain why.

Merge request reports

Loading