fix(query): prune filtered path-finding frontiers
Summary
This fixes the filtered path-finding timeout by pruning code graph frontiers earlier and narrowing code path searches to traversal paths that contain both filtered endpoints.
The traversal-path join is only used when every endpoint and edge table involved exposes traversal_path, so User path-finding queries do not take this path.
Changes
- Add traversal-path scoped pruning for path_finding when endpoint and edge tables support traversal_path.
- Keep filtered endpoint hop frontiers anchored through nf* CTEs and apply relationship-kind filtering to those hop CTEs.
- Carry traversal-path scope and endpoint kind into path hop-frontier CTEs so deeper hop filters stay narrow.
- Preserve lowerer-selected nf* columns and push filtered Definition endpoint predicates into LIMIT 1 BY.
- Keep latest-main traversal-path trie collapse and compact the resulting filters with arrayExists.
- Add data-correctness coverage for filtered Definition path finding staying on one traversal_path.
Base
- Rebased on latest origin/main: e558c36a
- Branch head: b429a88104d2e56c24ab2682642dc5577b068cfe
Validation
- mise exec -- cargo fmt --all --check
- mise lint:code
- mise exec -- cargo test -p compiler path_finding_ --lib
- mise exec -- cargo test -p compiler security --lib
- mise exec -- cargo test -p integration-testkit splits_data_correctness_seed
- cargo test -p integration-tests --test containers data_correctness -- --nocapture
- git diff --check
Read-Only ClickHouse Validation
Compiled the same Definition compile to run_query path_finding query locally, then ran SELECT-only SQL through kubectl with gkg_reader, active v19 graph tables, and query cache disabled. No mutations were run.
| Run | Query shape | Status | Wall time | CH elapsed | Read rows | Read bytes | Rows to read | Memory | Result rows |
|---|---|---|---|---|---|---|---|---|---|
| codex_pf_before_main_20260429T143658Z | Latest main baseline | Timeout, code 159 | 30.04s | 30.024s | 94,173,262 | 8,639,234,216 | 171,001,452 | 1,285,556,239 | 0 |
| 840c1eea-1468-4cad-998a-d0ffac60076f | Current branch single compiled SQL | OK | 6.07s | 5.770s | 109,169,238 | 8,719,414,671 | 1,300,953,295 | 711,225,020 | 0 |
The current branch changes the query from a 30s timeout to a completed read in about 6s on active v19 tables. The active v19 schema does not have the newer relationship-kind projections from this repository, so ClickHouse is still scanning more than it should for the dynamic frontier sets.
Relates to #390