Skip to content

Schedule materialized view refresh

Adam Hegyi requested to merge 431453-schedule-materialized-view-refresh into master

What does this MR do and why?

This MR implements a scheduled cron worker for refreshing the contributions materialized view. At a later step some of the logic can be reused to implement refreshing other materialized views.

Refreshing a materialized view can take some time thus the sidekiq job needs to be able to follow the refresh process where the previous job stopped. For this, we persist the current state in Redis.

How does it work:

  1. Invoke the worker.
  2. When the previous run was just recently (within 1 week), then do nothing.
  3. Otherwise invoke the service and pass in the state which is taken from Redis.
  4. The state might include the next_value (id) variable which tells the service to continue the processing from a specific point.
  5. If time limit is reached, return over_time status and include the next value.
  6. If the table is processed, return finished status.

The job is scheduled in every 10 minutes but in reality most of the time the job will be no-op due to the 1 week waiting period.

The MV rebuilding is guarded with two feature flags:

  • rebuild_contributions_mv: enables/disables the worker
  • rebuild_mv_drop_old_tables: drop the old MV tables. For "safety" reasons this is disabled by default. I'll need to do some checks in CH and remove the old table manually...

MR acceptance checklist

Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

How to set up and validate locally

Enable the feature flag and run the worker:

Feature.enable(:rebuild_contributions_mv)

ClickHouse::RebuildMaterializedViewCronWorker.new.perform

# Verify that the data is the same in tmp and original table:

ClickHouse::Client.select("select distinct id from contributions", :main).pluck("id").sort == ClickHouse::Client.select("select distinct id from tmp_contributions", :main).pluck("id").sort

# Cleanup: drop the tmp tables manually

ClickHouse::Client.execute("drop table tmp_contributions", :main)
ClickHouse::Client.execute("drop table tmp_contributions_mv", :main)

Related to #431453 (closed)

Edited by Adam Hegyi

Merge request reports