CiFinishedBuildsSyncService inserts duplicate records in ClickHouse ci_finished_builds table
Summary
!132010 (merged) added a service that takes unprocessed p_ci_finished_build_ch_sync_events
records and sends the corresponding denormalized ci_builds
records to ClickHouse. However, in staging we noticed that the number of records inserted is much higher than the total number of records in the source and destination tables.
The problem seems to be the way that we loop through the unprocessed records. Each time that we process a batch, we mark the records as processed and request the database the next batch. However, the batch can easily come from another replica which is not yet up-to-date, causing the same records to be processed again.
Steps to reproduce
Example Project
What is the current bug behavior?
When we enabled the FF, the first 3 batches showed a number of records inserted that largely exceeded the number of total records in the source p_ci_finished_build_ch_sync_events
table:
The source table contained 25355 records:
What is the expected correct behavior?
We should not insert the same records twice. Namely, running the following queries in ClickHouse Cloud should return the same number:
SELECT COUNT(*) FROM "ci_finished_builds";
SELECT countMerge (count_builds) FROM ci_finished_builds_aggregated_queueing_delay_percentiles;
Relevant logs and/or screenshots
Output of checks
Results of GitLab environment info
Expand for output related to GitLab environment info
(For installations with omnibus-gitlab package run and paste the output of: `sudo gitlab-rake gitlab:env:info`) (For installations from source run and paste the output of: `sudo -u git -H bundle exec rake gitlab:env:info RAILS_ENV=production`)
Results of GitLab application Check
Expand for output related to the GitLab application check
(For installations with omnibus-gitlab package run and paste the output of:
sudo gitlab-rake gitlab:check SANITIZE=true
)(For installations from source run and paste the output of:
sudo -u git -H bundle exec rake gitlab:check RAILS_ENV=production SANITIZE=true
)(we will only investigate if the tests are passing)