chore(querying): streamline & harden parameterization codepaths in graph engine

What does this MR do and why?

Problem

The query engine generated parameterized ClickHouse SQL but had no concept of types. Parameters were bare serde_json::Values, and the ClickHouse placeholder type (e.g. {p0:String}) was guessed at codegen time by peeking at the JSON variant. This caused three concrete issues:

  1. Type information was lost between compilation and execution. The query engine inferred types during codegen, emitted them into SQL placeholders, then threw them away. Downstream, bind_param in the ClickHouse client had to re-infer types from the JSON value — a second, independent guess that could disagree with what the SQL expected. For scalar values this happened to work; for arrays it couldn't, because a Value::Array doesn't tell you the element type.
  2. Array parameters required a hack. IN clauses with multiple values were emitted as N individual scalar parameters ({p0:String}, {p1:String}, ...). ClickHouse supports {p0:Array(String)} as a single parameter, which is both more efficient and simpler. But without a type system that could express Array(T), there was no way to emit this.
  3. Shared types lived in the wrong places. ClickHouse type enums belonged to the query engine, but the ClickHouse client needed them too. Arrow extraction utilities lived in gkg-server, but integration-testkit duplicated the same dispatch logic. There was no shared home for cross-cutting types.

Solution

  1. Typed parameters end-to-end. Introduced ChScalar and ChType enums that model ClickHouse's type system for parameters, including Array(T). Every parameter in the AST now carries its ClickHouse type explicitly via Expr::Param. This type flows through codegen into ParamValue, which is what gets stored in ParameterizedQuery.params. The ClickHouse client's bind_param now accepts &ChType directly — no string matching, no re-inference.
  2. Convenience builders. Expr::string(), Expr::int(), Expr::and(), Expr::col_in() make the lowering code read like intent rather than plumbing. col_in handles the 0/1/N dispatch (None, Eq, IN with Array) in one place.
  3. gkg-utils crate. Created a shared crate that houses types needed across crate boundaries:
    • clickhouse module: ChScalar, ChType — imported by both query-engine (AST construction) and clickhouse-client (parameter binding)
    • arrow module: ArrowUtils, ColumnValue — imported by gkg-server, available to integration-testkit without duplication
  4. Bounds validation. node_ids capped at 500, filter IN values capped at 100 — enforced both in the JSON schema (fail-fast at validation) and in the AST checker (defense-in-depth).

Testing

All existing query compilation tests (search, traversal, aggregation, path-finding, neighbors, multi-hop, pagination) continue to pass unchanged, the parameterized SQL output is identical. All assertions are identical, but in some cases access patterns have changed to account for the usage of new structs.

Performance Analysis

  • This merge request does not introduce any performance regression. If a performance regression is expected, explain why.
Edited by Michael Usachenko

Merge request reports

Loading