Skip to content

diff: Sort out behaviour of GetPatchID with binary diffs

Patrick Steinhardt requested to merge pks-get-patch-id-binary-diffs into master

When computing patch IDs in the GetPatchID RPC we pipe the output of git diff $oldrev $newrev into git patch-id --stable. As we don't pass the --binary flag to git-diff(1), the diff would thus look like the following for a binary diff:

diff --git a/file b/file
index 0858941..b52a4cb 100644
Binary files a/file and b/file differ

Notably, the diff only contains a reference to the pre- and post-image blob IDs, but it does not contain the actual binary diff. And this is good as computing binary diffs can be quite expensive, so we ultimately don't want to pass --binary. So ideally, Git would just fall back to computing the patch ID based on the blob IDs.

As it turns out though this wasn't the case until Git v2.39.0, which has only just been released two weeks ago at the point of writing this. In previous versions Git would have computed the same patch ID for all such binary diffs regardless of the pre- and post-image blob IDs. This is of course not what we want in the context of GetPatchID, which is going to be used to determine whether there are effective changes between two versions of the same merge request.

This bug was fixed upstream via 0df19eb9d9 (builtin: patch-id: fix patch-id with binary diffs, 2022-10-24), which subsequently broke our CI as we now have different patch IDs depending on the Git version.

While we cannot help the fact that Git before v2.39.0 is broken, we can at least document this in our codebase and double down on the expected new behaviour. This is done by explicitly adding the --full-index flag to git-diff(1) as a means to state that we intend to compute binary diffs via the pre- and post-image blobs instead of using the binary diffs.

Changelog: fixed

Merge request reports