Skip to content

Introduce re-insert logic for ClickHouse consumer

Problem

We want to build complex materialized views (MVs) in ClickHouse where the MV may JOIN related 1:1 tables to enrich rows with additional columns.

Consider the following structure:

image

In this example, the hierarchy_work_items MV pulls in labels and assignees using JOINs. However, when related tables (like siphon_issue_assignees or work_item_label_links) change, those changes are not automatically reflected in hierarchy_work_items because the MV only listens to changes on its source table (siphon_issues).

Proposal

Introduce a mechanism that automatically re-inserts the parent row when related rows in other tables change.

Example configuration

A possible config for the ClickHouse consumers:

clickhouse:
  # ...
  refresh_on_change:
    - table: 'siphon_issue_assignees'
      parent_table: 'siphon_issues'
      foreign_key: 'issue_id'
      parent_key: 'id'

How would this work?

  1. When we receive a change event on the issue_assignees subject, the ClickHouse consumer:

Inserts the rows into siphon_issue_assignees as usual.

Before acknowledging the message, it emits a logical replication event to the issues subject with a special event type (e.g., type = 2).

This event only contains the primary key values for the related parent rows.

  1. The goroutine processing the issues subject picks up this event.
  • It recognizes the special type.
  • For each primary key:
    • It selects the latest row from siphon_issues.
    • It re-inserts that row with a new version value.

As a result, the hierarchy_work_items MV sees an update for the parent row, triggering a fresh JOIN — so labels and assignees stay in sync.

Benefits

  • Keeps MVs up-to-date with changes from related tables.
  • No need for costly periodic backfill jobs.
  • Uses existing event streams with minimal overhead.