Skip to content

Periodical events sync worker for ClickHouse

Adam Hegyi requested to merge 414937-implement-periodical-sync-worker into master

What does this MR do and why?

This MR implements a worker that periodically syncs events data to ClickHouse. The data syncing would be SaaS only for now.

How does it work:

  1. Job is enqueued by cron.
  2. Check if the job is enabled (CH configured and FF is on).
  3. Acquire distributed lock using Redis.
  4. Load the cursor from CH so we can continue the syncing from a particular events.id value. (ClickHouse::SyncCursor class)
  5. Build an enumerator that yields events using EachBatch.
  6. The enumerator is passed to a Gzip CSV writer.
  7. When reaching the threshold (5k rows), the CSV file is closed and uploaded to CH.
  8. Start a new loop and a new csv file.
  9. When time is up or no more data is left stop the processing and update the cursor with the last inserted events.id.

PG queries

The worker will mostly read data from replicas.

How to set up and validate locally

  1. Ensure that ClickHouse is set up: https://docs.gitlab.com/ee/development/database/clickhouse/clickhouse_within_gitlab.html#configure-your-rails-application
  2. Enable the feature flag: Feature.enable(:event_sync_worker_for_click_house)
  3. Invoke the worker: ClickHouse::EventsSyncWorker.new.perform
  4. Verify that the events table on CH is populated: ClickHouse::Client.select("select * from events", :main)

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #414937 (closed)

Edited by Adam Hegyi

Merge request reports