[indexer] Schema version tracking with table prefix support
## Problem GKG needs to track which schema version is active and derive table prefixes from it. This extends the V0 schema version tracking (Issue #426) with prefix derivation logic and a retention config. ## Proposal ### Schema version constant Embed `SCHEMA_VERSION: u32` as a compile-time constant in the binary (initially `0`). ### ClickHouse control table ```sql CREATE TABLE IF NOT EXISTS gkg_schema_version ( version UInt32, status Enum8('active' = 1, 'migrating' = 2, 'retired' = 3, 'dropped' = 4), created_at DateTime DEFAULT now() ) ENGINE = ReplacingMergeTree(created_at) ORDER BY version; ``` This table survives across schema versions — it is never prefixed or dropped. ### Table prefix derivation ```rust fn table_prefix(schema_version: u32) -> String { if schema_version == 0 { String::new() // no prefix for v0 (backward compatible) } else { format!("v{}_", schema_version) } } fn prefixed_table_name(table: &str, schema_version: u32) -> String { format!("{}{}", table_prefix(schema_version), table) } ``` ### Configuration ```yaml schema: max_retained_versions: 2 # total table sets to keep (default: 2) ``` With the default of 2: after migrating to v2, the indexer keeps v2 (active) + v1 (rollback target), and drops v0 tables automatically. ### Schema version file Store the version in a dedicated file `config/SCHEMA_VERSION` containing just the integer (e.g. `0`). This is simpler to diff-check than a Rust constant buried in source code. The Rust binary reads this at compile time via `include_str!` and parses it into a `u32`. ### CI + lefthook enforcement **CI job** (`schema-version-check`, added to `.gitlab-ci.yml` lint stage): ```bash #!/usr/bin/env bash # scripts/check-schema-version.sh set -euo pipefail BASE_REF="${CI_MERGE_REQUEST_DIFF_BASE_SHA:-origin/main}" # Check if schema-affecting files changed in this MR if git diff --name-only "$BASE_REF"...HEAD | grep -qE '^(config/graph\.sql|config/graph_local\.sql|config/ontology/)'; then # Schema or ontology changed — SCHEMA_VERSION must also be bumped if ! git diff "$BASE_REF"...HEAD -- config/SCHEMA_VERSION | grep -q '^+[0-9]'; then echo "ERROR: config/graph.sql or config/ontology/ changed but config/SCHEMA_VERSION was not bumped." echo "If this change affects the ClickHouse schema, bump the version." echo "If this is a non-schema change (e.g. comments, formatting), you can skip this check" echo "by adding [skip schema-version-check] to the MR description." exit 1 fi fi echo "Schema version check passed." ``` The CI job follows the existing pattern (`agent-file-sync-check`, `ontology-schema-validate`) — lightweight alpine image, MR-only, lint stage. It also supports a `[skip schema-version-check]` escape hatch for non-schema ontology changes (e.g. description updates). **Lefthook pre-commit hook** (added to `lefthook.yml`): ```yaml - name: schema-version-check run: ./scripts/check-schema-version.sh glob: - "config/graph.sql" - "config/graph_local.sql" - "config/ontology/**/*.yaml" ``` This gives developers immediate local feedback, matching the pattern used for `agent-file-sync` and `ontology-schema` checks. ### Startup behavior All service modes (webserver, indexer, dispatcher) read the active schema version from `gkg_schema_version` on startup. If the table doesn't exist, it is created and version 0 is recorded as `active`. ## Acceptance criteria - [ ] `SCHEMA_VERSION` stored in `config/SCHEMA_VERSION` file, read at compile time - [ ] `gkg_schema_version` ClickHouse table created if not exists on startup - [ ] `table_prefix()` and `prefixed_table_name()` functions implemented - [ ] `schema.max_retained_versions` config setting with default of 2 - [ ] CI job (`schema-version-check`): fails MR if schema/ontology changes without version bump - [ ] Lefthook pre-commit hook: same check locally - [ ] Active schema version readable from ClickHouse by all service modes - [ ] Unit tests for prefix derivation and version comparison ## Existing implementation to build on - **!824** (`feat(indexer): add V0 schema version tracking and mismatch detection`) — already implements `SCHEMA_VERSION` constant, `gkg_schema_version` table, CI check script, periodic mismatch detection, and integration tests. V0.5 extends this with table prefix derivation, `max_retained_versions` config, and the `status` column in the control table. - **!809** (`feat(migration): add distributed lock and reconciler`) — NATS KV distributed lock implementation that can be reused for the migration lock in Issue 3. ## Dependencies Extends V0 Issue #426 (schema version tracking and mismatch detection) ## Blocks - Issue 3: Table-prefix-aware indexer - Issue 4: Table-prefix-aware web server - Issue 5: Migration completion and cleanup
issue