feat(indexer): add plan module with AST, codegen, and ontology-driven pipeline plans

What does this MR do and why?

MR 2 of the SDLC v2 integration plan. Adds the structured query pipeline types that will replace the string-templated SQL in prepare.rs. Nothing is wired in yet; this is purely additive with no behavior change.

New files

plan/ast.rs — Minimal SQL AST (Query, Expr, SelectExpr, TableRef, Op). Only models what the ETL pipeline needs. Expr::Raw is the escape hatch for ClickHouse-specific fragments.

plan/codegen.rs — Walks the AST and emits SQL strings.

plan/from_ontology.rs — Builds PipelinePlan structs from ontology YAML. Handles both Table and Query ETL types, generates extract queries with watermark conditions, node transforms (int enum CASE expressions, column renaming), and FK edge transforms (multi-value delimiter splitting). Partitions plans into global vs namespaced.

plan/mod.rsExtractQuery owns cursor state and generates paginated SQL on demand via composite key DNF clauses. PipelinePlan and TransformOutput unify node and edge ETL into one abstraction.

checkpoint.rsCheckpoint struct tracking watermark + cursor position, used by ExtractQuery::resume_from to pick up interrupted pagination.

Next MRs

  • MR 3: Expand checkpoint.rs with the ClickHouse persistence store
  • MR 4: Rewire pipeline and handlers to use the plan module (actual switchover)

Testing

Unit and integration test

Performance Analysis

  • This merge request does not introduce any performance regression. If a performance regression is expected, explain why.

Merge request reports

Loading