Geo: Improve query performance of the latest blob types when object storage is not synced
Follow up for #224634 (comment 380104627) / !36771 (comment 380206862):
It sounds like it is reasonable to move forward with a nullable
packages_package_files.file_storecolumn for now, since there is only 1 record withfile_storeNULLin production, it is an exceptional case rather than a normal one. I'll open a follow up issue to make all thefile_storecolumns non-nullable for Geo queries.
When sync_object_storage is false, RegistryConsistencyService calls methods which use the clause WHERE file_store = 1. We need to:
- index that field.
- make it non-nullable.
- So we don't have to add
OR file_store IS NULLto ensure we process all records, which harms performance (see the performance improvement when we removed this clause from LFS objects).
- So we don't have to add
Proposal
-
Add index on packages_package_files.file_store=> !36771 (merged) -
Add index on merge_request_diffs.external_diff_store=> !37265 (merged) -
Add index on terraform_states.file_store=> !37265 (merged) -
Add index on vulnerability_exports.file_store=> !37265 (merged) -
Make field non-nullable: packages_package_files.file_store -
Make field non-nullable: merge_request_diffs.external_diff_store - [-]
Make field non-nullable:The Terraform State data model is changing, and the file_store field will be non-nullableterraform_states.file_store
- [ ] Make field non-nullable: vulnerability_exports.file_store
A wrinkle: "Make field non-nullable" was a multi-release process for attachments/job artifacts/lfs objects: #213382 (closed)
Data migration
In %13.3:
MR "Part 1 of Enforce NOT NULL external_diff_store on merge_request_diffs" !38549 (merged)
Why use a background migration (and a new partial index) to update existing rows WHERE external_diff_store IS NULL?
Because 19M of 93M rows have external_diff_store IS NULL.
-
Set a default external_diff_store:- Pre-deployment migration (because app servers must have the default in schema before the not null constraint is added post-deployment):
with_lock_retries do change_column_default :merge_request_diffs, :external_diff_store, 1 end - See !30769 (merged)
- Pre-deployment migration (because app servers must have the default in schema before the not null constraint is added post-deployment):
-
Add a partial index to support migrating NULLs:- Post-deployment migration adding a partial index on
merge_request_diffs.idWHERE external_diff_store IS NULL - See !28832 (merged)
- Post-deployment migration adding a partial index on
-
Add a not null constraint ignoring existing values: - Post-deployment migration:
add_not_null_constraint(:merge_request_diffs, :external_diff_store, validate: false) - See !31261 (diffs) but don't do the inline update. Instead, we need to do a background migration in the next MR.
- Post-deployment migration:
MR "Part 2 of Enforce NOT NULL external_diff_store on merge_request_diffs" !38719 (merged)
Why use a background migration to update existing rows WHERE external_diff_store IS NULL?
Because 19M of 93M rows have external_diff_store IS NULL.
-
Update existing NULLs to local store:- Post-deployment migration scheduling background jobs to update batches of merge_request_diffs with NULL external_diff_store to the LOCAL value
We can enable replication (via the normal process of default off, test on staging, default on) of merge_request_diffs in the same release as this MR, since replication of any rows with external_diff_store IS NULL will automatically resolve as the background migrations finish.
MR "Part 1 of Enforce NOT NULL file_store on packages_package_files" !38822 (merged)
The 13.3 merge_request_diffs work was split into 2 MRs for personal workflow purposes, but there is no need to for this one since it's basically copying the work.
-
Copy !38549 (merged) but for packages_package_files.file_store -
Copy !38719 (merged) but for packages_package_files.file_store
Replication of any rows with file_store IS NULL will resolve when the background migrations finish.
Additional context
gitlab-org/geo-team/discussions#4972 (comment 380277011):
if there are many records of a blob type with file_store null, then that type may need to wait for non-nullable file_store, which is non-trivial
From https://gitlab.slack.com/archives/C02PF508L/p1594859616391800:
- merge_request_diffs has 25M out of 90M records with
external_store IS NULL- terraform_states has 0 NULLs out of 453 records
- vulnerability_exports has 0 NULLs out of 0 records
I assume the latter two we can move forward with already. But it looks like MR diffs needs to wait for non-nullable
external_store(or at least after the BG migrations have been released).