Build a rechecksum worker and service capable of handling models with composite PK
What does this MR do and why?
This MR presents a new worker and associated service that trigger a re-checksum action on a Geo primary site.
The re-checksum action means that for the selected model, all its associated verification_state
values will be changed to :pending
, effectively marking the record to be picked up by the verification worker cron job.
As the worker can be enqueued multiple times, we need to be careful to avoid workers doing useless / duplicate work.
The main advantage of this worker over existing ones, is that it can support model which PK is composite. See !206339 (comment 2787265985) for a more detailed discussion.
References
Database plans
There are 17 models with which the new service can work, but they are all following the same structure and have the same indexes. See the template from where they all stem from.
The each_batch
will do the update on relation_after_cursor.verification_state_not_pending
which means:
- from the first to the batch limit of 1000 as typically the cursor is only set when the job needs to be retried/timed out;
-
verification_state != 0
(not pending).
I'm showing the plans for the first two batches of 1000 records, using the project_states
and upload_states
table:
Project states
SQL:
UPDATE "project_states" SET "verification_state" = 0 WHERE "project_states"."verification_state" != 0 AND "project_states"."id" >= 2001000 AND "project_states"."id" < 2002001
Query plan: https://postgres.ai/console/gitlab/gitlab-production-main/sessions/44232/commands/135518
SQL:
UPDATE "project_states" SET "verification_state" = 0 WHERE "project_states"."verification_state" != 0 AND "project_states"."id" >= 2002001 AND "project_states"."id" < 2003001
Query plan: https://postgres.ai/console/gitlab/gitlab-production-main/sessions/44232/commands/135519
Upload states
SQL:
UPDATE "upload_states" SET "verification_state" = 0 WHERE "upload_states"."verification_state" != 0 AND "upload_states"."upload_id" >= 1000000000 AND "upload_states"."upload_id" < 1000001001
Query plan: https://postgres.ai/console/gitlab/gitlab-production-main/sessions/44232/commands/135722
SQL:
UPDATE "upload_states" SET "verification_state" = 0 WHERE "upload_states"."verification_state" != 0 AND "upload_states"."upload_id" >= 1000001001 AND "upload_states"."upload_id" < 1000002001
Query plan: https://postgres.ai/console/gitlab/gitlab-production-main/sessions/44232/commands/135723
How to set up and validate locally
First of all, you need a Geo-enabled GDK. See the advanced GDK docs on how to set-up a secondary site.
- Select a class you'd like to test with. For example,
Project
. - Make sure there are projects which replication state is not
pending
. You can run:Project.verification_pending
in the Rails console, and it hopefully should be empty or return only a subset of records. - Run the service from this MR with
Geo::BulkPrimaryVerificationService.new('project').async_execute
- Now run again
Project.verification_pending
; this time it should return all records, as they've all been marked as pending.
MR acceptance checklist
Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.