Dedupe `merge_request_diff_files`
<!--IssueSummary start-->
<details>
<summary>
Everyone can contribute. [Help move this issue forward](https://handbook.gitlab.com/handbook/marketing/developer-relations/contributor-success/community-contributors-workflows/#contributor-links) while earning points, leveling up and collecting rewards.
</summary>
- [Close this issue](https://contributors.gitlab.com/manage-issue?action=close&projectId=278964&issueIid=19420)
</details>
<!--IssueSummary end-->
`merge_request_diff_files` contains the entire diff history of every merge request, which means it grows very fast. Previously, we stored this in a serialised YAML column on `merge_request_diffs`, so we couldn't do much about that, but now we can.
The problem is this:
1. I create an MR adding files `a`, `b`, and `c`, each of which have 100 lines.
2. That creates three entries in `merge_request_diff_files`.
3. Someone points out that I was meant to add `d`, also with 100 lines.
4. I make that change and push, without changing `a`, `b`, or `c`.
5. We insert four more rows in `merge_request_diff_files`, with the first three only differing in their `merge_request_diff_id` and (potentially) `relative_order` columns.
Now that we have separate tables for this, we could denormalise even further by taking a hash of the file's contents, like this:
1. We create a new `merge_request_diff_file_contents` table with two columns:
1. `diff` - the equivalent of `merge_request_diff_files.diff` now.
2. `hash` - a hash (we can use whatever hash function makes most sense) of the `diff` column, which is indexed.
2. `merge_request_diff_files` loses the `diff` column, and gains a `merge_request_diff_file_contents_hash` foreign key instead.
This is basically reinventing part of git inside our database, but it's pretty simple. Migrating will be hard, though, and we just migrated to `merge_request_diff_files` in the first place :disappointed:
issue