Introduce re-insert logic for ClickHouse consumer
Problem
We want to build complex materialized views (MVs) in ClickHouse where the MV may JOIN related 1:1 tables to enrich rows with additional columns.
Consider the following structure:
In this example, the hierarchy_work_items
MV pulls in labels and assignees using JOINs.
However, when related tables (like siphon_issue_assignees
or work_item_label_links
) change, those changes are not automatically reflected in hierarchy_work_items because the MV only listens to changes on its source table (siphon_issues
).
Proposal
Introduce a mechanism that automatically re-inserts the parent row when related rows in other tables change.
Example configuration
A possible config for the ClickHouse consumers:
clickhouse:
# ...
refresh_on_change:
- table: 'siphon_issue_assignees'
parent_table: 'siphon_issues'
foreign_key: 'issue_id'
parent_key: 'id'
How would this work?
- When we receive a change event on the issue_assignees subject, the ClickHouse consumer:
Inserts the rows into siphon_issue_assignees
as usual.
Before acknowledging the message, it emits a logical replication event to the issues
subject with a special event type (e.g., type = 2
).
This event only contains the primary key values for the related parent rows.
- The goroutine processing the issues subject picks up this event.
- It recognizes the special type.
- For each primary key:
- It selects the latest row from
siphon_issues
. - It re-inserts that row with a new version value.
- It selects the latest row from
As a result, the hierarchy_work_items MV sees an update for the parent row, triggering a fresh JOIN — so labels and assignees stay in sync.
Benefits
- Keeps MVs up-to-date with changes from related tables.
- No need for costly periodic backfill jobs.
- Uses existing event streams with minimal overhead.