Skip to content

Resizing a repo and running cleanup results in loss of commits and changes details in merge requests - does it need to be this way?

Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.

Proposal

We have a long-established process in our docs for reducing the size of a repository by removing one or more objects from the commit history using the filter-repo utility. This results in the commit which introduced the object plus all subsequent related commits being recreated with new refs, with any reference to the target object removed. The object can then be pruned from the repo and the repo size reduced.

This process works as expected when a file is removed that is not involved in merge request e.g. a file added to an unmerged feature branch.The object is removed from the git repository on the server, reducing the storage used on the server and in local clones of the repo.

However when the process is applied to an object that has been created via a merge request the internal refs cleanup process which removes merge-requests and keep-around references from the repo so as to allow the target object to be pruned results in loss of changes and commits details from all MRs related to the deleted object when viewed in the UI. This is due to the RepositoryCleanupWorker removing the merge_request_diffs and related table records for the affected MRs based on the original (now replaced) refs.

Given the commit history has been rewritten such that only references to the deleted object are removed might it be possible for the cleanup process to instead update the refs stored in the database to point to the new commit refs, and if necessary regenerate the diffs, such that MR commits and changes data is still available?

Or is this too complex to consider and the loss of MR commits and changes detail unavoidable?

I've created a docs MR to add a warning about what currently happens when cleanup is run - ideally this wouldn't need to be there.

Notes

Here is an example of the queries run during cleanup which remove the merge_request_diffs records:

queries
2023-06-07_05:53:16.41105 LOG:  execute : /*application:sidekiq,correlation_id:01H2A6N7VBMTHKKDPR8Y1KW1QB,jid:2afac763b9cdc896dfb0092b,endpoint_id:RepositoryCleanupWorker,db_config_name:main*/ SELECT "merge_request_diffs".* FROM "merge_request_diffs" INNER JOIN "merge_requests" ON "merge_requests"."id" = "merge_request_diffs"."merge_request_id" INNER JOIN "merge_request_diff_commits" ON "merge_request_diff_commits"."merge_request_diff_id" = "merge_request_diffs"."id" WHERE "merge_requests"."target_project_id" = 23 AND "merge_request_diff_commits"."sha" IN ('\x65544c0aa9b410ffbdaf986e95049a1fb103dd98', '\x917db894c57e6b11477aff510a3dc4d29acf6f0b', '\x3d53c58a1ac862157abeb3f37eb49c9c8b9f87c2', '\xbc8067eed8ed69a25314fc1ddb199c0692fb590c', '\x97365c56e699e4f3199fa63839a89aa3c82359a9', '\xe04758674f4a52b4e507407f5bc6c3e6f60aa3fb', '\xc8ac4601563e149359332ac4eae8eb280694b924', '\xadef60e7985e27bd31adf1eba4ce33eaeb56c9e4', '\xcbe28e64ab49454c29285c864faa206c2b528817', '\x8df2f51135935a210f66d2dd3d76c2fc6ced8dad', '\x0da11abf89480798edb7adbb065cb340a6ffe84e', '\xc971350d3c9033dd8ee024e9ad9d8dfb153de47a', '\x8454630aa80dc5ed0fd3acbbeb54fbee9e6247ba', '\x30f290dc686ee5cccce9fa77b7f2959d3af6c976', '\x247bfd220223b5113c6bc38e7f5bbebe89248c86', '\xc3056223b6e9ab4055b63e48e1bbd5120857ce21', '\x172b72054b2a53a1485727af81cf1762576fe2d4', \x74f83e5d9f7ed16e85f499fdd893d06ce9368b26')
 
2023-06-07_05:53:16.42060 LOG:  execute : /*application:sidekiq,correlation_id:01H2A6N7VBMTHKKDPR8Y1KW1QB,jid:2afac763b9cdc896dfb0092b,endpoint_id:RepositoryCleanupWorker,db_config_name:main*/ DELETE FROM "merge_request_diffs" WHERE "merge_request_diffs"."id" = 4192
2023-06-07_05:53:16.42173 LOG:  execute : /*application:sidekiq,correlation_id:01H2A6N7VBMTHKKDPR8Y1KW1QB,jid:2afac763b9cdc896dfb0092b,endpoint_id:RepositoryCleanupWorker,db_config_name:main*/ DELETE FROM "merge_request_diffs" WHERE "merge_request_diffs"."id" = 4187
2023-06-07_05:53:16.42568 LOG:  execute : /*application:sidekiq,correlation_id:01H2A6N7VBMTHKKDPR8Y1KW1QB,jid:2afac763b9cdc896dfb0092b,endpoint_id:RepositoryCleanupWorker,db_config_name:main*/ DELETE FROM "merge_request_diffs" WHERE "merge_request_diffs"."id" = 4189
```
Edited by 🤖 GitLab Bot 🤖