chore(graphsec): harden engine + query pipeline w/ more gating and test coverage

What does this MR do and why?

Problem

The query pipeline had three categories of issues:

1. Security filters didn't recurse into derived tables. security.rs skipped TableRef::Union and TableRef::Subquery entirely — the rationale ("only gl_edge scans go through union") lived in a comment, not in code. check.rs had the same gap: it validated subqueries but skipped union arms. apply_to_query recursed into top-level q.union_all arms but not into TableRef::Union (a derived table inside FROM). These are different AST constructs, and only the first was covered. If anything changed about what tables appeared in unions, security filters would silently not apply.

2. enforce.rs injected columns for aliases that didn't exist in FROM. enforce_return_columns walked every input node and blindly injected SELECT columns referencing its alias, even when that alias wasn't in the query's FROM clause. This fails closed (ClickHouse column-not-found), but the error surfaces far from the cause and the intent was never validated structurally.

3. Scattered hardcoded strings and a misplaced constant. "_gkg_" was hardcoded across hydration.rs and formatter.rs. HYDRATION_NODE_ALIAS lived in gkg-server/constants.rs instead of query-engine where the rest of the column-naming logic lives. build_dynamic_search_query silently fell back to "*" when an entity wasn't in the ontology, and hydrated property keys came back with alias prefixes (hydrate_username instead of username).

Solution

Recursive security filter and check injection

Added apply_security_to_from() in security.rs — walks the TableRef tree and recurses into Union arms (each is a full Query with its own where_clause) and Subquery inner queries. Also wired apply_to_query to recurse into top-level q.union_all arms. Removed the old assert_union_has_no_sensitive_tables / sensitive_table_in_from helpers that were the "panic if unexpected" approach.

Same pattern in check.rs: renamed check_subqueries_in_from to check_derived_tables_in_from, added the Union arm, and added recursion into q.union_all arms on the main check_query path.

Unreferenced node and duplicate ID validation

Added check_unreferenced_nodes to the validator — every declared node must be referenced by at least one structural field (relationship from/to, aggregation target/group_by, path from/to). Runs after individual reference checks so "undefined node X" errors take priority over "node Y is unreferenced". This replaces a runtime guard that was previously in enforce.rs — the problem is a malformed input, so it belongs in the validator.

Also added check_duplicate_node_ids — two nodes with the same id would silently collide downstream, now rejected early.

codegen UNION ALL fix

emit_query delegated to emit_query_body only for CTE definitions — top-level queries had an inlined copy that didn't emit UNION ALL arms. Extracted the shared emit_query_body method so both paths use it. UNION ALL now works at the top level, not just inside CTEs.

Constants and hydration cleanup

Moved HYDRATION_NODE_ALIAS to query-engine/constants.rs (value: "hydrate"), re-exported from lib.rs. Replaced all hardcoded "_gkg_" in hydration.rs and formatter.rs with GKG_COLUMN_PREFIX, NEIGHBOR_ID_COLUMN, NEIGHBOR_TYPE_COLUMN, RELATIONSHIP_TYPE_COLUMN, and the redaction_id_column() helper.

build_dynamic_search_query now returns Result — missing ontology entity is an error, not a silent "*" fallback. parse_property_batches strips the alias prefix from property keys so consumers see username instead of hydrate_username. formatter.rs switched from prefix matching (starts_with("_gkg_neighbor")) to exact column name comparisons.

Testing

Split the monolithic hydration_full_pipeline test into focused subtests and added the redact-then-hydrate flow that was previously untested. Extracted shared helpers (MockRedactionService, compile_and_execute, run_redaction) from redaction_integration.rs into common/mod.rs so both test files use them. Added new redaction subtests for MergeRequest entities, row order preservation, empty path finding, and cross-entity ID collision.

Added/Changed Tests:

Area Test What it verifies
Area Test What it verifies
--- --- ---
security.rs inject_recurses_into_union_from_arms Filters applied to each TableRef::Union arm in FROM
security.rs inject_recurses_into_union_all_arms Filters applied to top-level q.union_all arms
security.rs union_aliases_are_not_collected Union outer alias not treated as a filterable scan
check.rs rejects_union_arm_missing_security_filter Union arm in FROM without filter is caught
check.rs accepts_union_arm_with_security_filter Union arm with filter passes
check.rs rejects_union_all_arm_without_security_filter Top-level UNION ALL arm without filter is caught
check.rs accepts_union_all_arms_with_security_filters Top-level UNION ALL arms with filters pass
check.rs rejects_cte_with_sensitive_table_missing_filter CTE scanning gl_* without filter is caught
check.rs accepts_cte_with_security_filter CTE with filter passes
enforce.rs default_entity_does_not_emit_pk_column No _gkg_*_pk when redaction_id_column == "id"
enforce.rs uses_correct_redaction_id_column_per_node Custom redaction_id_column with multi-table FROM
validate.rs rejects_duplicate_node_ids Two nodes with same id rejected
validate.rs accepts_unique_node_ids Distinct node IDs accepted
validate.rs rejects_unreferenced_traversal_node Orphan node not in any relationship rejected
validate.rs accepts_aggregation_node_referenced_only_by_target Node referenced only by aggregation target passes
codegen.rs union_all_in_top_level_query UNION ALL emitted at top level, LIMIT after
codegen.rs union_all_in_cte_body WITH RECURSIVE + UNION ALL in CTE
codegen.rs table_ref_union_emits_derived_table TableRef::Union emits (... UNION ALL ...) AS alias
hydration_integration.rs path_finding_dynamic_hydration Path nodes hydrated with properties
hydration_integration.rs path_finding_hydrated_property_values Correct property values on hydrated path nodes
hydration_integration.rs path_finding_json_format Formatter emits path array with properties and edges
hydration_integration.rs neighbors_dynamic_hydration Neighbor nodes hydrated
hydration_integration.rs neighbors_hydrated_property_values Correct property values on hydrated neighbors
hydration_integration.rs neighbors_json_format Formatter merges neighbor properties as top-level keys
hydration_integration.rs search_produces_no_hydration_plan Search queries skip hydration
hydration_integration.rs traversal_produces_no_hydration_plan Traversal queries skip hydration
hydration_integration.rs path_finding_hydration_after_partial_redaction Redact then hydrate: denied paths removed, survivors hydrated
hydration_integration.rs neighbors_hydration_after_partial_redaction Redact then hydrate: denied neighbor rows removed
hydration_integration.rs path_finding_all_denied_then_hydrate Full denial produces zero rows after hydration
redaction_integration.rs search_merge_requests_with_redaction MergeRequest entity redaction
redaction_integration.rs redaction_preserves_row_order ASC order preserved after partial redaction
redaction_integration.rs redaction_preserves_row_order_desc DESC order preserved after partial redaction
redaction_integration.rs path_finding_no_path_exists_returns_empty No paths = zero rows, zero redactions
redaction_integration.rs cross_entity_id_collision_redaction Same numeric ID across entity types redacted independently

Performance Analysis

  • This merge request does not introduce any performance regression. If a performance regression is expected, explain why.
Edited by Michael Usachenko

Merge request reports

Loading