Skip to content

Eventual author consistency worker for CA

What does this MR do and why?

This MR adds a worker for ensuring eventual consistency in the ClickHouse database for the Contribution Analytics feature. When the user is deleted from the database, this worker ensures that the event records are cleaned up in ClickHouse.

How does it work:

In ClickHouse the event_authors table tracks the unique author ids (user ids) for the events table. The worker iterates over the table and checks if the user exists or not. If the user cannot be found in the PostgreSQL database, delete all events related to the user in ClickHouse.

How to set up and validate locally

  1. Ensure that you're on premium plan
  2. Ensure CH is configured: https://docs.gitlab.com/ee/development/database/clickhouse/clickhouse_within_gitlab.html#gdk-setup
  3. Enable the sync feature flag: Feature.enabled(:event_sync_worker_for_click_house)
  4. If your GDK is seeded, you can sync initial data to ClickHouse from rails console: ClickHouse::EventsSyncWorker.new.perform
  5. Find a user that has events and delete it:
author_id = Event.pluck(:author_id).uniq.sort.last
User.where(id: author_id).delete_all

# Verify if we have some data in ClickHouse for this author

ClickHouse::Client.select("select * from events where author_id = #{author_id}", :main)

# Invoke the worker
ClickHouse::EventAuthorsConsistencyCronWorker.new.perform

Verify that data is gone from ClickHouse:

ClickHouse::Client.select("select * from events where author_id = #{author_id}", :main)

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #428260 (closed)

Edited by Adam Hegyi

Merge request reports