Skip to content

Optionally support Replicated tables in ClickHouse

Problem: Our current ClickHouse migration framework assumes that we use ClickHouse Cloud or a single-node ClickHouse instance.

This is a problem because on a multi-node ClickHouse instance database tables need to be created differently:

  • Current: ENGINE=MergeTree
  • On multi-node ClickHouse instance (Replicated DB engine): ENGINE=ReplicatedMergeTree

Replacement regex:

"Engine =MergeTree".gsub(/(Engine\s*=\s*)(\w+)/i, '\1Replicating\2')

# Engine =ReplicatingMergeTree

Note: ensure that we do this replacement for MergeTree engines only.

Task 1: Update the ClickHouse migration framework to handle this seamlessly.

  1. Before the migration runs, look up the database engine.
  2. If the engine is replicated, on each SQL statement, scan for ENGINE and replace the engine name automatically.

Query for determining the database engine:

SELECT engine
FROM system.databases WHERE name = 'database name' 
LIMIT 1;

Task 2: Dealing with structure.sql

When structure.sql is written, always ensure we store the non-replicated engine name. When loading the structure.sql file, ensure that we properly replace the engine names with the replicated engine name if the database engine is replicated.

  • Atomic: no replacement needed
  • Replicated: replacement needed

Note: ClickHouse Cloud actually uses Replicated engine and it "automagically" rewrites the engine name thus things "just work". This feature is not available on self-hosted CH instances.

Task 3: Update the docs and mention this change in the version support matrix.

https://docs.gitlab.com/integration/clickhouse/#supported-clickhouse-versions

Edited by 🤖 GitLab Bot 🤖