chore(querying): streamline & harden parameterization codepaths in graph engine
What does this MR do and why?
Problem
The query engine generated parameterized ClickHouse SQL but had no concept of types. Parameters were bare serde_json::Values, and the ClickHouse placeholder type (e.g. {p0:String}) was guessed at codegen time by peeking at the JSON variant. This caused three concrete issues:
-
Type information was lost between compilation and execution. The query engine inferred types during codegen, emitted them into SQL placeholders, then threw them away. Downstream,
bind_paramin the ClickHouse client had to re-infer types from the JSON value — a second, independent guess that could disagree with what the SQL expected. For scalar values this happened to work; for arrays it couldn't, because aValue::Arraydoesn't tell you the element type. -
Array parameters required a hack.
INclauses with multiple values were emitted as N individual scalar parameters ({p0:String}, {p1:String}, ...). ClickHouse supports{p0:Array(String)}as a single parameter, which is both more efficient and simpler. But without a type system that could expressArray(T), there was no way to emit this. - Shared types lived in the wrong places. ClickHouse type enums belonged to the query engine, but the ClickHouse client needed them too. Arrow extraction utilities lived in gkg-server, but integration-testkit duplicated the same dispatch logic. There was no shared home for cross-cutting types.
Solution
-
Typed parameters end-to-end. Introduced
ChScalarandChTypeenums that model ClickHouse's type system for parameters, includingArray(T). Every parameter in the AST now carries its ClickHouse type explicitly viaExpr::Param. This type flows through codegen intoParamValue, which is what gets stored inParameterizedQuery.params. The ClickHouse client'sbind_paramnow accepts&ChTypedirectly — no string matching, no re-inference. -
Convenience builders.
Expr::string(),Expr::int(),Expr::and(),Expr::col_in()make the lowering code read like intent rather than plumbing.col_inhandles the 0/1/N dispatch (None, Eq, IN with Array) in one place. -
gkg-utilscrate. Created a shared crate that houses types needed across crate boundaries:-
clickhousemodule:ChScalar,ChType— imported by bothquery-engine(AST construction) andclickhouse-client(parameter binding) -
arrowmodule:ArrowUtils,ColumnValue— imported bygkg-server, available tointegration-testkitwithout duplication
-
-
Bounds validation.
node_idscapped at 500, filterINvalues capped at 100 — enforced both in the JSON schema (fail-fast at validation) and in the AST checker (defense-in-depth).
Related Issues
Testing
All existing query compilation tests (search, traversal, aggregation, path-finding, neighbors, multi-hop, pagination) continue to pass unchanged, the parameterized SQL output is identical. All assertions are identical, but in some cases access patterns have changed to account for the usage of new structs.
Performance Analysis
- This merge request does not introduce any performance regression. If a performance regression is expected, explain why.
Edited by Michael Usachenko