fix(code-graph): mask sign bit so node ids are always positive
Summary
Definition, File, Directory, ImportedSymbol, and Branch ids are content-hashed via FxHasher::finish() (a u64) and stored as Int64. The as i64 cast reinterpreted the bits, so any hash with the high bit set surfaced as a negative value like "-3105496773625129529" in the API. About half of all ids hit that case.
This MR masks the sign bit in compute_id and compute_branch_id so ids stay in [0, 2^63). Same precedent already in workspace::project_id_from_path.
Relates to #475.
Verification
Before, against production:
{"type":"Definition","id":"-514837287758239620","name":"compute_id","fqn":"code_graph::v2::linker::graph::compute_id"}After (next reindex), the same row will land in the positive range. Existing rows keep their old, half-negative ids until reindexed, since the order key (traversal_path, project_id, branch, id) differs and ReplacingMergeTree cannot dedupe across that change.
Why mask, not switch to u64 end-to-end
The full pipeline is locked to signed Int64: ontology YAML, config/graph.sql, the Arrow BatchBuilder::push_int(i64) API, DuckDB BIGINT, the DSL node_ids parser via serde_json::Number::as_i64(), and the UInt64Array -> Int64 clamp in crates/utils/src/arrow.rs. ADR 004 also documents stringified Int64 for the wire format. A real u64 would touch every one of those without changing what callers see, since the response already serializes ids as JSON strings.
Issue #475 tracks the larger discussion (UUID, hex, longer hash) and stays open. This MR is the small, no-schema-change fix.
Test plan
-
cargo test -p code-graph v2::linker::graph::tests::compute_id_is_always_non_negative -
cargo test -p indexer modules::code::arrow_converter::tests::compute_branch_id_is_always_non_negative -
mise lint:code - After merge and reindex of one project, query a
Definitionand confirm theidno longer starts with-