diff: Sort out behaviour of GetPatchID with binary diffs
When computing patch IDs in the GetPatchID RPC we pipe the output of
git diff $oldrev $newrev
into git patch-id --stable
. As we don't
pass the --binary
flag to git-diff(1), the diff would thus look like
the following for a binary diff:
diff --git a/file b/file
index 0858941..b52a4cb 100644
Binary files a/file and b/file differ
Notably, the diff only contains a reference to the pre- and post-image
blob IDs, but it does not contain the actual binary diff. And this is
good as computing binary diffs can be quite expensive, so we ultimately
don't want to pass --binary
. So ideally, Git would just fall back to
computing the patch ID based on the blob IDs.
As it turns out though this wasn't the case until Git v2.39.0, which has only just been released two weeks ago at the point of writing this. In previous versions Git would have computed the same patch ID for all such binary diffs regardless of the pre- and post-image blob IDs. This is of course not what we want in the context of GetPatchID, which is going to be used to determine whether there are effective changes between two versions of the same merge request.
This bug was fixed upstream via 0df19eb9d9 (builtin: patch-id: fix patch-id with binary diffs, 2022-10-24), which subsequently broke our CI as we now have different patch IDs depending on the Git version.
While we cannot help the fact that Git before v2.39.0 is broken, we can
at least document this in our codebase and double down on the expected
new behaviour. This is done by explicitly adding the --full-index
flag
to git-diff(1) as a means to state that we intend to compute binary
diffs via the pre- and post-image blobs instead of using the binary
diffs.
Changelog: fixed