MR diff migration: perform I/O outside of database transaction
What does this MR do?
Holding the database transaction open while we stream data to object storage or disk via carrierwave is not great. On the other hand, failed migrations leaving us in an inconsistent state is also not great.
I think we remain in a consistent state when just modifying the backing file for external diff stores, when migrating diffs to object storage. Nobody consults it until the merge_request_diffs
and merge_request_diff_files
rows are updated, so we can change it freely. This should allow the I/O work to be moved outside of the database transaction when migrating.
Screenshots
CarrierWave is a bit difficult to dig into, so I added a marker to the CarrierWave::Storage::Fog#store
method so I could see when traffic was happening.
Before
[2] pry(main)> d.migrate_files_to_external_storage!
(0.5ms) SELECT COUNT(*) FROM "merge_request_diff_files" WHERE "merge_request_diff_files"."merge_request_diff_id" = 2725
MergeRequestDiffFile Load (0.3ms) SELECT "merge_request_diff_files".* FROM "merge_request_diff_files" WHERE "merge_request_diff_files"."merge_request_diff_id" = 2725 ORDER BY "merge_request_diff_files"."merge_request_diff_id" ASC, "merge_request_diff_files"."relative_order" ASC
(0.2ms) BEGIN
MergeRequestDiffFile Destroy (0.4ms) DELETE FROM "merge_request_diff_files" WHERE "merge_request_diff_files"."merge_request_diff_id" = 2725
(6.8ms) INSERT INTO merge_request_diff_files ("new_file", "renamed_file", "deleted_file", "too_large", "a_mode", "b_mode", "new_path", "old_path", "binary", "merge_request_diff_id", "relative_order", "external_diff_offset", "external_diff_size")
VALUES (FALSE, FALSE, FALSE, FALSE, '100644', '100644', 'README.md', 'README.md', FALSE, 2725, 0, 0, 57)
MergeRequestDiff Update (0.4ms) UPDATE "merge_request_diffs" SET "updated_at" = '2020-07-01 12:11:26.256753', "external_diff" = 'diff-2725', "stored_externally" = TRUE WHERE "merge_request_diffs"."id" = 2725
==> store!
<== store!
MergeRequestDiff Update (1.2ms) UPDATE "merge_request_diffs" SET "external_diff_store" = 2 WHERE "merge_request_diffs"."id" = 2725
(1.7ms) COMMIT
MergeRequestDiffFile Load (1.0ms) SELECT "merge_request_diff_files".* FROM "merge_request_diff_files" WHERE "merge_request_diff_files"."merge_request_diff_id" = 2725 ORDER BY "merge_request_diff_files"."merge_request_diff_id" ASC, "merge_request_diff_files"."relative_order" ASC
After
[2] pry(main)> d.migrate_files_to_external_storage!
(0.7ms) SELECT COUNT(*) FROM "merge_request_diff_files" WHERE "merge_request_diff_files"."merge_request_diff_id" = 2725
MergeRequestDiffFile Load (0.6ms) SELECT "merge_request_diff_files".* FROM "merge_request_diff_files" WHERE "merge_request_diff_files"."merge_request_diff_id" = 2725 ORDER BY "merge_request_diff_files"."merge_request_diff_id" ASC, "merge_request_diff_files"."relative_order" ASC
==> store!
<== store!
(0.1ms) BEGIN
MergeRequestDiffFile Destroy (0.3ms) DELETE FROM "merge_request_diff_files" WHERE "merge_request_diff_files"."merge_request_diff_id" = 2725
(0.6ms) INSERT INTO merge_request_diff_files ("new_file", "renamed_file", "deleted_file", "too_large", "a_mode", "b_mode", "new_path", "old_path", "binary", "merge_request_diff_id", "relative_order", "external_diff_offset", "external_diff_size")
VALUES (FALSE, FALSE, FALSE, FALSE, '100644', '100644', 'README.md', 'README.md', FALSE, 2725, 0, 0, 57)
MergeRequestDiff Update (0.5ms) UPDATE "merge_request_diffs" SET "updated_at" = '2020-07-01 12:05:37.450132', "external_diff" = 'diff-2725', "stored_externally" = TRUE WHERE "merge_request_diffs"."id" = 2725
MergeRequestDiff Update (0.9ms) UPDATE "merge_request_diffs" SET "external_diff_store" = 2 WHERE "merge_request_diffs"."id" = 2725
(1.6ms) COMMIT
MergeRequestDiffFile Load (0.4ms) SELECT "merge_request_diff_files".* FROM "merge_request_diff_files" WHERE "merge_request_diff_files"."merge_request_diff_id" = 2725 ORDER BY "merge_request_diff_files"."merge_request_diff_id" ASC, "merge_request_diff_files"."relative_order" ASC
The difference is in when store
is called - inside or outside the BEGIN
block.
Does this MR meet the acceptance criteria?
Conformity
-
Changelog entry -
Documentation (if required) -
Code review guidelines -
Merge request performance guidelines -
Style guides -
Database guides -
Separation of EE specific content
Availability and Testing
-
Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process. -
Tested in all supported browsers -
Informed Infrastructure department of a default or new setting change, if applicable per definition of done
Security
If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:
-
Label as security and @ mention @gitlab-com/gl-security/appsec
-
The MR includes necessary changes to maintain consistency between UI, API, email, or other methods -
Security reports checked/validated by a reviewer from the AppSec team
Part of #216458 (closed)