fix(query): prune filtered path-finding frontiers

Summary

This fixes the filtered path-finding timeout by pruning code graph frontiers earlier and narrowing code path searches to traversal paths that contain both filtered endpoints.

The traversal-path join is only used when every endpoint and edge table involved exposes traversal_path, so User path-finding queries do not take this path.

Changes

  • Add traversal-path scoped pruning for path_finding when endpoint and edge tables support traversal_path.
  • Keep filtered endpoint hop frontiers anchored through nf* CTEs and apply relationship-kind filtering to those hop CTEs.
  • Carry traversal-path scope and endpoint kind into path hop-frontier CTEs so deeper hop filters stay narrow.
  • Preserve lowerer-selected nf* columns and push filtered Definition endpoint predicates into LIMIT 1 BY.
  • Keep latest-main traversal-path trie collapse and compact the resulting filters with arrayExists.
  • Add data-correctness coverage for filtered Definition path finding staying on one traversal_path.

Base

  • Rebased on latest origin/main: e558c36a
  • Branch head: b429a88104d2e56c24ab2682642dc5577b068cfe

Validation

  • mise exec -- cargo fmt --all --check
  • mise lint:code
  • mise exec -- cargo test -p compiler path_finding_ --lib
  • mise exec -- cargo test -p compiler security --lib
  • mise exec -- cargo test -p integration-testkit splits_data_correctness_seed
  • cargo test -p integration-tests --test containers data_correctness -- --nocapture
  • git diff --check

Read-Only ClickHouse Validation

Compiled the same Definition compile to run_query path_finding query locally, then ran SELECT-only SQL through kubectl with gkg_reader, active v19 graph tables, and query cache disabled. No mutations were run.

Run Query shape Status Wall time CH elapsed Read rows Read bytes Rows to read Memory Result rows
codex_pf_before_main_20260429T143658Z Latest main baseline Timeout, code 159 30.04s 30.024s 94,173,262 8,639,234,216 171,001,452 1,285,556,239 0
840c1eea-1468-4cad-998a-d0ffac60076f Current branch single compiled SQL OK 6.07s 5.770s 109,169,238 8,719,414,671 1,300,953,295 711,225,020 0

The current branch changes the query from a 30s timeout to a completed read in about 6s on active v19 tables. The active v19 schema does not have the newer relationship-kind projections from this repository, so ClickHouse is still scanning more than it should for the dynamic frontier sets.

Relates to #390

Merge request reports

Loading