Skip to content

Prepare GitLab ClickHouse DB for Siphon

As part of the MVP work, we plan to enable these tables for Siphon:

To receive this data in SaaS ClickHouse, we'll need to prepare the database schema in the GitLab application repo.

For each table:

  1. Create a new ReplactingMergeTree table prefixed with siphon_. The prefix tells "us" that this table is populated from siphon.
  2. For each column, inspect the data type matrix and do the necessary transformation: https://gitlab.com/gitlab-org/architecture/gitlab-data-analytics/design-doc/-/blob/master/designs/logical_replication_mvp.md#supported-data-types (for example bigint -> UInt64)
  3. Create a migration file that creates this database table on ClickHouse.
  4. Add a test case that ensures schema integrity (column list should match). This is important for detecting schema changes.

Additionally, we need to add the following columns to the table (for the ReplcingMergeTree engine):

  • siphon_replicated_at (datetime64)
  • siphon_deleted (boolean)

See the research issue for a concrete example.

  • Note 1: on self-managed these tables will be empty for now.
  • Note 2: it might make sense to write a generator for this, since most of the information can be derived from the PG table / AR model.
Edited by Felipe Cardozo