Remove merge request diffs from the import
Problem to solve
Merge request diffs are (usually) stored in the database, and can be very large. They are included in project exports, which increases their size, as well as the amount of time an import takes to run.
The diffs themselves are duplicate data - the bundle stored in the project repository already has everything they contain.
Target audience
-
Delaney, Development Team Lead, https://design.gitlab.com/research/personas#persona-delaney
-
Sasha, Software Developer, https://design.gitlab.com/research/personas#persona-sasha
Further details
We already store references to every merge request version, using the refs/keep-around
system, so the git repository is certain to have everything we're interested in, right?
Proposal
Stop storing the merge_request_diff*
tables in project exports, or importing them on project import. Instead, regenerate those tables - without data loss - from the git repository.
I think the easiest method would be to store the information about which versions are which in the refs/merge-requests
reference hierarchy. For instance, we could have:
refs/merge-requests/1/versions/latest
refs/merge-requests/1/versions/1
refs/merge-requests/1/versions/2
# ...
(since we already have refs/merge-requests/1
as a file, this exact layout isn't possible, but you get the idea).
From this information, I believe, we can, at project import time, reconstruct the entire merge_request_diffs
table, and its two children - merge_request_diff_commits
and merge_request_diff_files
.
This also has positive characteristics for the project import itself - it becomes less vulnerable to security issues, conceptually simpler, and more resilient to churn in those tables as we push forward with more features, like external diffs.
We could also begin removing diffs from the database when an MR is closed or merged, without losing any of the information required to show that diff again in the future.
What does success look like, and how can we measure that?
Project export archives become smaller without any degradation of functionality.