Skip to content

Build a rechecksum worker and service capable of handling models with composite PK

What does this MR do and why?

This MR presents a new worker and associated service that trigger a re-checksum action on a Geo primary site. The re-checksum action means that for the selected model, all its associated verification_state values will be changed to :pending, effectively marking the record to be picked up by the verification worker cron job.

As the worker can be enqueued multiple times, we need to be careful to avoid workers doing useless / duplicate work.

The main advantage of this worker over existing ones, is that it can support model which PK is composite. See !206339 (comment 2787265985) for a more detailed discussion.

References

#537709

Database plans

There are 17 models with which the new service can work, but they are all following the same structure and have the same indexes. See the template from where they all stem from.

The each_batch will do the update on relation_after_cursor.verification_state_not_pending which means:

  • from the first to the batch limit of 1000 as typically the cursor is only set when the job needs to be retried/timed out;
  • verification_state != 0 (not pending).

I'm showing the plans for the first two batches of 1000 records, using the project_states and upload_states table:

Project states

SQL:

UPDATE "project_states" SET "verification_state" = 0 WHERE "project_states"."verification_state" != 0 AND "project_states"."id" >= 2001000 AND "project_states"."id" < 2002001

Query plan: https://postgres.ai/console/gitlab/gitlab-production-main/sessions/44232/commands/135518

SQL:

UPDATE "project_states" SET "verification_state" = 0 WHERE "project_states"."verification_state" != 0 AND "project_states"."id" >= 2002001 AND "project_states"."id" < 2003001

Query plan: https://postgres.ai/console/gitlab/gitlab-production-main/sessions/44232/commands/135519

Upload states

SQL:

UPDATE "upload_states" SET "verification_state" = 0 WHERE "upload_states"."verification_state" != 0 AND "upload_states"."upload_id" >= 1000000000 AND "upload_states"."upload_id" < 1000001001

Query plan: https://postgres.ai/console/gitlab/gitlab-production-main/sessions/44232/commands/135722

SQL:

UPDATE "upload_states" SET "verification_state" = 0 WHERE "upload_states"."verification_state" != 0 AND "upload_states"."upload_id" >= 1000001001 AND "upload_states"."upload_id" < 1000002001

Query plan: https://postgres.ai/console/gitlab/gitlab-production-main/sessions/44232/commands/135723

How to set up and validate locally

First of all, you need a Geo-enabled GDK. See the advanced GDK docs on how to set-up a secondary site.

  1. Select a class you'd like to test with. For example, Project.
  2. Make sure there are projects which replication state is not pending. You can run: Project.verification_pending in the Rails console, and it hopefully should be empty or return only a subset of records.
  3. Run the service from this MR with Geo::BulkPrimaryVerificationService.new('project').async_execute
  4. Now run again Project.verification_pending; this time it should return all records, as they've all been marked as pending.

MR acceptance checklist

Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Chloe Fons

Merge request reports

Loading