Skip to content

MR diff migration: perform I/O outside of database transaction

Nick Thomas requested to merge 216458-migrate-mr-diffs-without-transaction into master

What does this MR do?

Holding the database transaction open while we stream data to object storage or disk via carrierwave is not great. On the other hand, failed migrations leaving us in an inconsistent state is also not great.

I think we remain in a consistent state when just modifying the backing file for external diff stores, when migrating diffs to object storage. Nobody consults it until the merge_request_diffs and merge_request_diff_files rows are updated, so we can change it freely. This should allow the I/O work to be moved outside of the database transaction when migrating.

Screenshots

CarrierWave is a bit difficult to dig into, so I added a marker to the CarrierWave::Storage::Fog#store method so I could see when traffic was happening.

Before

[2] pry(main)> d.migrate_files_to_external_storage!
   (0.5ms)  SELECT COUNT(*) FROM "merge_request_diff_files" WHERE "merge_request_diff_files"."merge_request_diff_id" = 2725
  MergeRequestDiffFile Load (0.3ms)  SELECT "merge_request_diff_files".* FROM "merge_request_diff_files" WHERE "merge_request_diff_files"."merge_request_diff_id" = 2725 ORDER BY "merge_request_diff_files"."merge_request_diff_id" ASC, "merge_request_diff_files"."relative_order" ASC
   (0.2ms)  BEGIN
  MergeRequestDiffFile Destroy (0.4ms)  DELETE FROM "merge_request_diff_files" WHERE "merge_request_diff_files"."merge_request_diff_id" = 2725
   (6.8ms)          INSERT INTO merge_request_diff_files ("new_file", "renamed_file", "deleted_file", "too_large", "a_mode", "b_mode", "new_path", "old_path", "binary", "merge_request_diff_id", "relative_order", "external_diff_offset", "external_diff_size")
        VALUES (FALSE, FALSE, FALSE, FALSE, '100644', '100644', 'README.md', 'README.md', FALSE, 2725, 0, 0, 57)

  MergeRequestDiff Update (0.4ms)  UPDATE "merge_request_diffs" SET "updated_at" = '2020-07-01 12:11:26.256753', "external_diff" = 'diff-2725', "stored_externally" = TRUE WHERE "merge_request_diffs"."id" = 2725
==> store!
<== store!
  MergeRequestDiff Update (1.2ms)  UPDATE "merge_request_diffs" SET "external_diff_store" = 2 WHERE "merge_request_diffs"."id" = 2725
   (1.7ms)  COMMIT
  MergeRequestDiffFile Load (1.0ms)  SELECT "merge_request_diff_files".* FROM "merge_request_diff_files" WHERE "merge_request_diff_files"."merge_request_diff_id" = 2725 ORDER BY "merge_request_diff_files"."merge_request_diff_id" ASC, "merge_request_diff_files"."relative_order" ASC

After

[2] pry(main)> d.migrate_files_to_external_storage!
   (0.7ms)  SELECT COUNT(*) FROM "merge_request_diff_files" WHERE "merge_request_diff_files"."merge_request_diff_id" = 2725
  MergeRequestDiffFile Load (0.6ms)  SELECT "merge_request_diff_files".* FROM "merge_request_diff_files" WHERE "merge_request_diff_files"."merge_request_diff_id" = 2725 ORDER BY "merge_request_diff_files"."merge_request_diff_id" ASC, "merge_request_diff_files"."relative_order" ASC
==> store!
<== store!
   (0.1ms)  BEGIN
  MergeRequestDiffFile Destroy (0.3ms)  DELETE FROM "merge_request_diff_files" WHERE "merge_request_diff_files"."merge_request_diff_id" = 2725
   (0.6ms)          INSERT INTO merge_request_diff_files ("new_file", "renamed_file", "deleted_file", "too_large", "a_mode", "b_mode", "new_path", "old_path", "binary", "merge_request_diff_id", "relative_order", "external_diff_offset", "external_diff_size")
        VALUES (FALSE, FALSE, FALSE, FALSE, '100644', '100644', 'README.md', 'README.md', FALSE, 2725, 0, 0, 57)

  MergeRequestDiff Update (0.5ms)  UPDATE "merge_request_diffs" SET "updated_at" = '2020-07-01 12:05:37.450132', "external_diff" = 'diff-2725', "stored_externally" = TRUE WHERE "merge_request_diffs"."id" = 2725
  MergeRequestDiff Update (0.9ms)  UPDATE "merge_request_diffs" SET "external_diff_store" = 2 WHERE "merge_request_diffs"."id" = 2725
   (1.6ms)  COMMIT
  MergeRequestDiffFile Load (0.4ms)  SELECT "merge_request_diff_files".* FROM "merge_request_diff_files" WHERE "merge_request_diff_files"."merge_request_diff_id" = 2725 ORDER BY "merge_request_diff_files"."merge_request_diff_id" ASC, "merge_request_diff_files"."relative_order" ASC

The difference is in when store is called - inside or outside the BEGIN block.

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

  • Label as security and @ mention @gitlab-com/gl-security/appsec
  • The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • Security reports checked/validated by a reviewer from the AppSec team

Part of #216458 (closed)

Edited by Bob Van Landuyt

Merge request reports