Geo: Improve query performance of the latest blob types when object storage is not synced
Follow up for #224634 (comment 380104627) / !36771 (comment 380206862):
It sounds like it is reasonable to move forward with a nullable
packages_package_files.file_store
column for now, since there is only 1 record withfile_store
NULL
in production, it is an exceptional case rather than a normal one. I'll open a follow up issue to make all thefile_store
columns non-nullable for Geo queries.
When sync_object_storage
is false, RegistryConsistencyService
calls methods which use the clause WHERE file_store = 1
. We need to:
- index that field.
- make it non-nullable.
- So we don't have to add
OR file_store IS NULL
to ensure we process all records, which harms performance (see the performance improvement when we removed this clause from LFS objects).
- So we don't have to add
Proposal
-
Add index on packages_package_files.file_store
=> !36771 (merged) -
Add index on merge_request_diffs.external_diff_store
=> !37265 (merged) -
Add index on terraform_states.file_store
=> !37265 (merged) -
Add index on vulnerability_exports.file_store
=> !37265 (merged) -
Make field non-nullable: packages_package_files.file_store
-
Make field non-nullable: merge_request_diffs.external_diff_store
- [-]
Make field non-nullable:The Terraform State data model is changing, and the file_store field will be non-nullableterraform_states.file_store
- [ ] Make field non-nullable: vulnerability_exports.file_store
A wrinkle: "Make field non-nullable" was a multi-release process for attachments/job artifacts/lfs objects: #213382 (closed)
Data migration
In %13.3:
!38549 (merged)
MR "Part 1 of Enforce NOT NULL external_diff_store on merge_request_diffs"Why use a background migration (and a new partial index) to update existing rows WHERE external_diff_store IS NULL
?
Because 19M of 93M rows have external_diff_store IS NULL
.
-
Set a default external_diff_store
:- Pre-deployment migration (because app servers must have the default in schema before the not null constraint is added post-deployment):
with_lock_retries do change_column_default :merge_request_diffs, :external_diff_store, 1 end
- See !30769 (merged)
- Pre-deployment migration (because app servers must have the default in schema before the not null constraint is added post-deployment):
-
Add a partial index to support migrating NULL
s:- Post-deployment migration adding a partial index on
merge_request_diffs.id
WHERE external_diff_store IS NULL
- See !28832 (merged)
- Post-deployment migration adding a partial index on
-
Add a not null constraint ignoring existing values: - Post-deployment migration:
add_not_null_constraint(:merge_request_diffs, :external_diff_store, validate: false)
- See !31261 (diffs) but don't do the inline update. Instead, we need to do a background migration in the next MR.
- Post-deployment migration:
!38719 (merged)
MR "Part 2 of Enforce NOT NULL external_diff_store on merge_request_diffs"Why use a background migration to update existing rows WHERE external_diff_store IS NULL
?
Because 19M of 93M rows have external_diff_store IS NULL
.
-
Update existing NULL
s to local store:- Post-deployment migration scheduling background jobs to update batches of merge_request_diffs with NULL external_diff_store to the LOCAL value
We can enable replication (via the normal process of default off, test on staging, default on) of merge_request_diffs
in the same release as this MR, since replication of any rows with external_diff_store IS NULL
will automatically resolve as the background migrations finish.
!38822 (merged)
MR "Part 1 of Enforce NOT NULL file_store on packages_package_files"The 13.3 merge_request_diffs work was split into 2 MRs for personal workflow purposes, but there is no need to for this one since it's basically copying the work.
-
Copy !38549 (merged) but for packages_package_files.file_store -
Copy !38719 (merged) but for packages_package_files.file_store
Replication of any rows with file_store IS NULL
will resolve when the background migrations finish.
Additional context
gitlab-org/geo-team/discussions#4972 (comment 380277011):
if there are many records of a blob type with file_store null, then that type may need to wait for non-nullable file_store, which is non-trivial
From https://gitlab.slack.com/archives/C02PF508L/p1594859616391800:
- merge_request_diffs has 25M out of 90M records with
external_store IS NULL
- terraform_states has 0 NULLs out of 453 records
- vulnerability_exports has 0 NULLs out of 0 records
I assume the latter two we can move forward with already. But it looks like MR diffs needs to wait for non-nullable
external_store
(or at least after the BG migrations have been released).