Skip to content

Add soft delete option to ClickHouse events table

Adam Hegyi requested to merge ah-add-soft-delete-option-to-ch-events-table into master

What does this MR do and why?

This MR adds soft delete capability to the ClickHouse events table by leveraging the is_deleted option for the ReplacingMergeTree engine: https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/replacingmergetree#is_deleted

Reasoning: we don't have strong consistency between CH and PG. To ensure eventual consistency we might periodically scan the CH and PG events table and delete the missing (deleted in PG) rows from CH.

Deployment

The feature is not available on production, at the moment we're doing experiments on STG. Since we don't have DB migration framework for CH yet the schema changes will happen by hand.

How to set up and validate locally

See the extended test case in the MR. How it works:

  1. You have a row in the events table with id=3
  2. Insert a new "version" of the row with higher updated_at timestamp and set the deleted column to 1
  3. Running SELECT * FROM events FINAL ensures that the "deleted" rows are filtered out. (this steps normally happens async)

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Adam Hegyi

Merge request reports