Org Mover - Move all verification details into separate tables (#16633) · Epics · GitLab.org

Org Mover - Move all verification details into separate tables

## Problem https://gitlab.com/gitlab-org/gitlab/-/merge_requests/174453#note_2256385015 >>> note: This scope is used by `Geo::VerifiableModel.pluck_verifiable_ids_in_range` and `Geo::VerifiableModel#in_verifiables?` methods. The `range` in both methods method is, at most, a range of IDs with a maximum of 1_000 records between them. We should refactor this to receive the range to avoid accidental usage within large tables like `ci_job_artifacts``.`.` This `verifiables` is also used by the `available_verifiables` scope when the replicable holds the verification details on the same table, and it is used by a bunch of scopes in the [Geo::VerificationState](https://gitlab.com/gitlab-org/gitlab/-/blob/master/ee/app/models/concerns/geo/verification_state.rb#L42-54) concern, and by [Geo::VerifiableReplicator](https://gitlab.com/gitlab-org/gitlab/-/blob/master/ee/app/models/concerns/geo/verifiable_replicator.rb#L173) class. We need to split the verification details for these replicables (4) into a separate table. With these changes, the `Geo::VerificationStateBackfillWorker` that iterates over the table corresponding to the replicable to backfill the corresponding verification state table will also perform the cleanup of records when we change the selective sync scope to a new set of organizations. >>> ## Replicable Models using the same table to store the verification state: - Model: `Ci::PipelineArtifact` - Immutable - Schema: `gitlab_ci` - Table `ci_pipeline_artifacts` - Model: `Packages::PackageFile` - Immutable - Schema: `gitlab_main` - Table `packages_package_files` - [Large and High-traffic table on GL.com](https://gitlab.com/gitlab-org/gitlab/-/blob/master/rubocop/rubocop-migrations.yml) - Model: `SnippetRepository` - Mutable - Schema: `gitlab_main` - Table: `snippet_repositories` - Model: `Terraform::StateVersion` - Immutable - Schema: `gitlab_main` - Table: `terraform_state_versions` - ## Proposal 1. Release N 1. Create the four verification state tables in regular schema migrations. 2. Update the application to [calculate the checksum](https://gitlab.com/gitlab-org/gitlab/-/blob/master/ee/app/models/concerns/geo/blob_replicator_strategy.rb#L144-155) to read from the original column for the immutable data types (`Ci::PipelineArtifact`, `Packages::PackageFile`, `Terraform::StateVersion`) during Geo's usual verification processes to backfill the separate tables. This avoids resource usage (CPU, network, etc) for immutable data types during the backfill phase. 3. Let Geo's usual verification processes backfill the separate tables for mutable data types (`SnippetRepository`). There is only one, and it is not a typical large table. This prevents us from introducing some bugs we experienced in the past, like https://gitlab.com/gitlab-org/gitlab/-/issues/387980. 2. Release N + 1 1. Update the application to read from the new table only for the immutable data types (`Ci::PipelineArtifact`, `Packages::PackageFile`, `Terraform::StateVersion`) during Geo's usual verification processes to backfill the separate tables. 2. Ignore the original columns. This starts the process of safely removing database columns, as described in our [guides](https://docs.gitlab.com/ee/development/database/avoiding_downtime_in_migrations.html#dropping-columns). 3. Release N + 2 1. Drop the original columns. 4. Release N + 3 1. Remove the ignore rule for the original columns. While this is a lengthy process, it does not require much effort. It helps us avoid making the application maintain both the old and new columns, which can lead to unknown edge cases or re-introduce bugs experienced in the past. Step 2, `Release N + 1,` should be after a required stop. The next one is scheduled for 17.11.

epic