Proposal: Handle database schema changes

Schema Change Propagation for Downstream Databases (ClickHouse)

This proposal describes an automated way to ensure that the downstream databases (ClickHouse, Iceberg?) can bring up their database schema up to date before changed rows arrive to the consumers.

At a high level, we would observe database schema changes in development time and automatically write "migration" files in a platform-independent way.

These migration files would be translated to database-specific statements and then executed before the PostgreSQL schema change happens.

Handling Database Schema Changes in Siphon MVP

On the producer level, Siphon is not concerned with the database schema changes. In case the schema of a configured table changes, the Siphon producer will automatically receive rows with the new column changes.

Example events for projects:

Logical replication stream starts.
Relation event (PG sends over the projects table schema: columns, types).
BEGIN event.
INSERT event to projects, data: id: 1, name: xyz ...
COMMIT event.
ALTER TABLE happens, projects schema is changed: new_column added.
Relation event (PG sends over the projects table schema: columns, types).
BEGIN event.
INSERT event to projects, data: id: 2, name: asd, new_column: x ...
COMMIT event.

The Siphon producer ensures that event packages (messages for NATS) for a given table always conform to a specific schema. In the example above, the 2 inserts would end up in separate packages because their schemas are different.

On the consumer level, before invoking INSERT, we request the database schema of the ClickHouse table and take the list of columns. For each event package, we exclude columns that are not present in the ClickHouse table schema.

After this step, the INSERT is executed. This means that in the MVP, database schema changes will not break when a database schema change happens on the PostgreSQL instance. Worst-case scenario: the ClickHouse consumer will not ingest the new column.

Requirements

For the PostgreSQL database generally we use safe migration techniques which can be mostly translated to other database systems (we think). For example, when we rename a database column, we don't invoke RENAME COLUMN. Instead, we add a new column, backfill it, and then at some point swap the database columns.

Within this issue, we intend to handle:

ADD COLUMN
DROP COLUMN
Swap columns (a special case when rename column happens in one transaction)
CHANGE COLUMN (just to verify nothing destructive happens like type changes)

Out of scope:

CREATE/DROP TABLE: we don't need to automatically pick up new tables
ADD/DROP PARTITION: will be handled in a separate issue

Pre and Post Deployment Migrations

In our migration tooling, there are two phases when database migrations are executed on PostgreSQL:

Pre-deployment (standard migrations)
- Invoked before the new version of the application starts.
- Both old and new versions of the application are compatible with the schema changes.
Post-deployment
- Invoked after the old version of the application has completely stopped.
- Only the new version of the application is compatible with the schema changes.

Introduction of "Generic Migrations"

Database schema changes are implemented in the GitLab repository, where there are many helper methods to invoke various schema-altering statements.

Within the schema handling code, an extension could be implemented that observes the DDL changes and automatically translates them to a DB-independent format.

Example migration, adding a column to issues: migration/20250225235323_add_title_to_issues.rb

def up
  add_column :issues, :title, :text
end

The code above is turned into SQL by Rails:

ALTER TABLE issues ADD COLUMN title text;

The task here is to translate the SQL statement to a generic format (example format):

{
  "table": "issues",
  "schema": "public",
  "operation": {
    "type": "add_column",
    "name": "title",
    "type": "string"
  }
}

Having the statement in a platform-independent format, we can generate the ClickHouse statement:

ALTER TABLE siphon_issues ADD COLUMN title String;

This conversion seems very simple. However, there is more work to be done. There will be materialized views defined on top of every siphon_* table, which means the materialized view definition would need to be recreated in a consistent way. This usually means adding a new materialized view where the new column is included and then dropping the old materialized view (without dropping the underlying materialized view table).

How It Would Work in Production (PRD)

The idea is to bring the downstream databases up to date before the PostgreSQL database is updated.

Example flow:

Deployment starts.
Pre-deployment migration phase:
1. Read the generic migration files.
2. For each data store we handle (ClickHouse, Iceberg), apply the changes.
3. Apply the PostgreSQL migrations.
New and old versions of the application run at the same time.
Post-deployment migration phase:
1. Read the generic migration files.
2. For each data store we handle (ClickHouse, Iceberg), apply the changes.
3. Apply the PostgreSQL migrations.
Deployment complete.

PoC MR: gitlab-org/gitlab!188058 (closed)

De-risking

Implement schema validation into GitLab CI to ensure that schema changes are correctly propagated to ClickHouse (e.g., after migrations, the column list matches).
Investigate the feasibility of building a DB-testing pipeline for ClickHouse so schema changes can be tested on a production-like dataset.

ClickHouse research

Given a ReplacingMergeTree table (siphon table) with one or more materialized views. Let's assume a column is planned to be removed from the source table.

How would we alter the siphon table and materialized views in a way that it won't break queries targeting the materialized view?

For each materialized views, create a temp materialized view pointing to the same table excluding the column to be deleted. (Causes double inserts but we deduplicate in query time)
Use EXCHANGE to swap materialzied views so application code doesn't need to change.
Remove the column from the MV table(s).
Remove the column from the siphon table.

Edited Apr 14, 2025 by Adam Hegyi