chore(graphsec): harden engine + query pipeline w/ more gating and test coverage
What does this MR do and why?
Problem
The query pipeline had three categories of issues:
1. Security filters didn't recurse into derived tables. security.rs skipped TableRef::Union and TableRef::Subquery entirely — the rationale ("only gl_edge scans go through union") lived in a comment, not in code. check.rs had the same gap: it validated subqueries but skipped union arms. apply_to_query recursed into top-level q.union_all arms but not into TableRef::Union (a derived table inside FROM). These are different AST constructs, and only the first was covered. If anything changed about what tables appeared in unions, security filters would silently not apply.
2. enforce.rs injected columns for aliases that didn't exist in FROM. enforce_return_columns walked every input node and blindly injected SELECT columns referencing its alias, even when that alias wasn't in the query's FROM clause. This fails closed (ClickHouse column-not-found), but the error surfaces far from the cause and the intent was never validated structurally.
3. Scattered hardcoded strings and a misplaced constant. "_gkg_" was hardcoded across hydration.rs and formatter.rs. HYDRATION_NODE_ALIAS lived in gkg-server/constants.rs instead of query-engine where the rest of the column-naming logic lives. build_dynamic_search_query silently fell back to "*" when an entity wasn't in the ontology, and hydrated property keys came back with alias prefixes (hydrate_username instead of username).
Solution
Recursive security filter and check injection
Added apply_security_to_from() in security.rs — walks the TableRef tree and recurses into Union arms (each is a full Query with its own where_clause) and Subquery inner queries. Also wired apply_to_query to recurse into top-level q.union_all arms. Removed the old assert_union_has_no_sensitive_tables / sensitive_table_in_from helpers that were the "panic if unexpected" approach.
Same pattern in check.rs: renamed check_subqueries_in_from to check_derived_tables_in_from, added the Union arm, and added recursion into q.union_all arms on the main check_query path.
Unreferenced node and duplicate ID validation
Added check_unreferenced_nodes to the validator — every declared node must be referenced by at least one structural field (relationship from/to, aggregation target/group_by, path from/to). Runs after individual reference checks so "undefined node X" errors take priority over "node Y is unreferenced". This replaces a runtime guard that was previously in enforce.rs — the problem is a malformed input, so it belongs in the validator.
Also added check_duplicate_node_ids — two nodes with the same id would silently collide downstream, now rejected early.
codegen UNION ALL fix
emit_query delegated to emit_query_body only for CTE definitions — top-level queries had an inlined copy that didn't emit UNION ALL arms. Extracted the shared emit_query_body method so both paths use it. UNION ALL now works at the top level, not just inside CTEs.
Constants and hydration cleanup
Moved HYDRATION_NODE_ALIAS to query-engine/constants.rs (value: "hydrate"), re-exported from lib.rs. Replaced all hardcoded "_gkg_" in hydration.rs and formatter.rs with GKG_COLUMN_PREFIX, NEIGHBOR_ID_COLUMN, NEIGHBOR_TYPE_COLUMN, RELATIONSHIP_TYPE_COLUMN, and the redaction_id_column() helper.
build_dynamic_search_query now returns Result — missing ontology entity is an error, not a silent "*" fallback. parse_property_batches strips the alias prefix from property keys so consumers see username instead of hydrate_username. formatter.rs switched from prefix matching (starts_with("_gkg_neighbor")) to exact column name comparisons.
Related Issues
Testing
Split the monolithic hydration_full_pipeline test into focused subtests and added the redact-then-hydrate flow that was previously untested. Extracted shared helpers (MockRedactionService, compile_and_execute, run_redaction) from redaction_integration.rs into common/mod.rs so both test files use them. Added new redaction subtests for MergeRequest entities, row order preservation, empty path finding, and cross-entity ID collision.
Added/Changed Tests:
| Area | Test | What it verifies |
|---|---|---|
| Area | Test | What it verifies |
| --- | --- | --- |
| security.rs | inject_recurses_into_union_from_arms |
Filters applied to each TableRef::Union arm in FROM |
| security.rs | inject_recurses_into_union_all_arms |
Filters applied to top-level q.union_all arms |
| security.rs | union_aliases_are_not_collected |
Union outer alias not treated as a filterable scan |
| check.rs | rejects_union_arm_missing_security_filter |
Union arm in FROM without filter is caught |
| check.rs | accepts_union_arm_with_security_filter |
Union arm with filter passes |
| check.rs | rejects_union_all_arm_without_security_filter |
Top-level UNION ALL arm without filter is caught |
| check.rs | accepts_union_all_arms_with_security_filters |
Top-level UNION ALL arms with filters pass |
| check.rs | rejects_cte_with_sensitive_table_missing_filter |
CTE scanning gl_* without filter is caught |
| check.rs | accepts_cte_with_security_filter |
CTE with filter passes |
| enforce.rs | default_entity_does_not_emit_pk_column |
No _gkg_*_pk when redaction_id_column == "id"
|
| enforce.rs | uses_correct_redaction_id_column_per_node |
Custom redaction_id_column with multi-table FROM |
| validate.rs | rejects_duplicate_node_ids |
Two nodes with same id rejected |
| validate.rs | accepts_unique_node_ids |
Distinct node IDs accepted |
| validate.rs | rejects_unreferenced_traversal_node |
Orphan node not in any relationship rejected |
| validate.rs | accepts_aggregation_node_referenced_only_by_target |
Node referenced only by aggregation target passes |
| codegen.rs | union_all_in_top_level_query |
UNION ALL emitted at top level, LIMIT after |
| codegen.rs | union_all_in_cte_body |
WITH RECURSIVE + UNION ALL in CTE |
| codegen.rs | table_ref_union_emits_derived_table |
TableRef::Union emits (... UNION ALL ...) AS alias
|
| hydration_integration.rs | path_finding_dynamic_hydration |
Path nodes hydrated with properties |
| hydration_integration.rs | path_finding_hydrated_property_values |
Correct property values on hydrated path nodes |
| hydration_integration.rs | path_finding_json_format |
Formatter emits path array with properties and edges |
| hydration_integration.rs | neighbors_dynamic_hydration |
Neighbor nodes hydrated |
| hydration_integration.rs | neighbors_hydrated_property_values |
Correct property values on hydrated neighbors |
| hydration_integration.rs | neighbors_json_format |
Formatter merges neighbor properties as top-level keys |
| hydration_integration.rs | search_produces_no_hydration_plan |
Search queries skip hydration |
| hydration_integration.rs | traversal_produces_no_hydration_plan |
Traversal queries skip hydration |
| hydration_integration.rs | path_finding_hydration_after_partial_redaction |
Redact then hydrate: denied paths removed, survivors hydrated |
| hydration_integration.rs | neighbors_hydration_after_partial_redaction |
Redact then hydrate: denied neighbor rows removed |
| hydration_integration.rs | path_finding_all_denied_then_hydrate |
Full denial produces zero rows after hydration |
| redaction_integration.rs | search_merge_requests_with_redaction |
MergeRequest entity redaction |
| redaction_integration.rs | redaction_preserves_row_order |
ASC order preserved after partial redaction |
| redaction_integration.rs | redaction_preserves_row_order_desc |
DESC order preserved after partial redaction |
| redaction_integration.rs | path_finding_no_path_exists_returns_empty |
No paths = zero rows, zero redactions |
| redaction_integration.rs | cross_entity_id_collision_redaction |
Same numeric ID across entity types redacted independently |
Performance Analysis
- This merge request does not introduce any performance regression. If a performance regression is expected, explain why.