Skip to content

Clean up migration to populate commit users

What does this MR do and why?

This cleans up any pending MigrateMergeRequestDiffCommitUsers migration jobs. In addition, we stop using the old columns, ignore them, and remove them in a post-deployment migration.

We don't need a separate release for the removal of the columns, as ignoring them in the deploy ensures it's safe to remove the columns in a post-deployment migration. In addition, the code has supported reading from/writing to the new data for the last two or so months.

With this change we should start seeing less growth in the merge_request_diff_commits table, as we no longer store many duplicate names and Emails. Once we find a way to reclaim unused space in the table, we should be able to free up roughly 500 GB of disk space as mentioned in the original issue (#331823 (closed)).

This fixes #334394 (closed)

Migration output

== 20211012134316 CleanUpMigrateMergeRequestDiffCommitUsers: migrating ========
== 20211012134316 CleanUpMigrateMergeRequestDiffCommitUsers: migrated (0.0033s)

== 20211012143815 RemoveMergeRequestDiffCommitColumns: migrating ==============
-- remove_column(:merge_request_diff_commits, :author_name, :text)
   -> 0.0014s
-- remove_column(:merge_request_diff_commits, :author_email, :text)
   -> 0.0004s
-- remove_column(:merge_request_diff_commits, :committer_name, :text)
   -> 0.0004s
-- remove_column(:merge_request_diff_commits, :committer_email, :text)
   -> 0.0003s
== 20211012143815 RemoveMergeRequestDiffCommitColumns: migrated (0.0026s) =====

A note about rollbacks

Rolling back these migrations is not implemented. The table we are migrating to is just under 800 MB for GitLab.com, and the table we are migrating from is over 2 TB. The original migration took almost two months, and required several adjustments to the migration process along the way. Restoring data in a migration rollback would likely take months to run. As a result, once migrated there's no rolling back of these data changes. The column removals can be reverted, but you'd be left with empty columns.

Fortunately this should be fine: when rolling back to something before this merge request, you'll roll back to code that supports reading/writing of both the old and new data. Since new data already exists at that point, there won't be a loss of data.

What isn't possible is rolling back this merge request and the one that originally introduced the migration, as you'll lose both the data in the new merge_request_diff_commit_users table and the old text based columns. Since that migration was introduced back in 14.1, this shouldn't pose any problems.

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Yorick Peterse

Merge request reports

Loading