Optionally support Replicated tables in ClickHouse
Problem: Our current ClickHouse migration framework assumes that we use ClickHouse Cloud or a single-node ClickHouse instance.
This is a problem because on a multi-node ClickHouse instance database tables need to be created differently:
- Current:
ENGINE=MergeTree - On multi-node ClickHouse instance (Replicated DB engine):
ENGINE=ReplicatedMergeTree
Replacement regex:
"Engine =MergeTree".gsub(/(Engine\s*=\s*)(\w+)/i, '\1Replicating\2')
# Engine =ReplicatingMergeTree
Note: ensure that we do this replacement for MergeTree engines only.
Task 1: Update the ClickHouse migration framework to handle this seamlessly.
- Before the migration runs, look up the database engine.
- If the engine is replicated, on each SQL statement, scan for
ENGINEand replace the engine name automatically.
Query for determining the database engine:
SELECT engine
FROM system.databases WHERE name = 'database name'
LIMIT 1;
Task 2: Dealing with structure.sql
When structure.sql is written, always ensure we store the non-replicated engine name. When loading the structure.sql file, ensure that we properly replace the engine names with the replicated engine name if the database engine is replicated.
-
Atomic: no replacement needed -
Replicated: replacement needed
Note: ClickHouse Cloud actually uses Replicated engine and it "automagically" rewrites the engine name thus things "just work". This feature is not available on self-hosted CH instances.
Task 3: Update the docs and mention this change in the version support matrix.
https://docs.gitlab.com/integration/clickhouse/#supported-clickhouse-versions