Improve generic CompareByte.
Generic `CompareByte` can be improved in the way resembling !360 (things like `PtrUint(ptr) := PtrUint(ptr) div 4 * 4` aren’t necessary for that, but they give slightly better code as well and I hope they are valid everywhere...). Patch: [CompareByte.patch](/uploads/b7987627caf8b794ae56d98d683c4d28/CompareByte.patch). It makes platform-specific implementations for `i386` and `x86_64` worse than generic for me, as `i386` uses bytewise loop and **three** `REP CMP`s (one would already be bad enough), and `x86_64` uses bytewise loop exclusively. So, unless someone comes up with SSE version, I propose to also remove [both](https://gitlab.com/freepascal.org/fpc/source/-/blob/55deefbab5a5f3f203587cfdb1f065251d3321f4/rtl/i386/i386.inc#L467) of [them](https://gitlab.com/freepascal.org/fpc/source/-/blob/55deefbab5a5f3f203587cfdb1f065251d3321f4/rtl/x86_64/x86_64.inc#L634). Benchmark: [CompareByte.pas](/uploads/cccda7cef3d1427d60e5476988c32843/CompareByte.pas). My results. Note `(!)` where second byte already differs, but `i386` version sees `len > 57` and issues three `REP CMP`s. ``` x86-64/win64 i386/win32 CompareByteGeneric: 288 b CompareByteGeneric: 304 b CompareByteGenericV2: 208 b CompareByteGenericV2: 176 b Different byte #0 of 1 Different byte #0 of 1 System.CompareByte: 1.8 ns/call System.CompareByte: 2.2 ns/call CompareByteGeneric: 3.0 ns/call CompareByteGeneric: 2.9 ns/call CompareByteGenericV2: 1.9 ns/call CompareByteGenericV2: 3.0 ns/call Different byte #7 of 8 Different byte #7 of 8 System.CompareByte: 4.9 ns/call System.CompareByte: 5.3 ns/call CompareByteGeneric: 11 ns/call CompareByteGeneric: 7.8 ns/call CompareByteGenericV2: 4.4 ns/call CompareByteGenericV2: 6.3 ns/call Different byte #15 of 16 Different byte #15 of 16 System.CompareByte: 7.8 ns/call System.CompareByte: 8.9 ns/call CompareByteGeneric: 19 ns/call CompareByteGeneric: 9.4 ns/call CompareByteGenericV2: 6.2 ns/call CompareByteGenericV2: 8.0 ns/call Different byte #23 of 24 Different byte #23 of 24 System.CompareByte: 9.9 ns/call System.CompareByte: 11 ns/call CompareByteGeneric: 26 ns/call CompareByteGeneric: 11 ns/call CompareByteGenericV2: 6.4 ns/call CompareByteGenericV2: 9.5 ns/call Different byte #1 of 100 Different byte #1 of 100 System.CompareByte: 1.8 ns/call System.CompareByte: 40 ns/call (!) CompareByteGeneric: 5.2 ns/call CompareByteGeneric: 5.8 ns/call CompareByteGenericV2: 2.9 ns/call CompareByteGenericV2: 4.3 ns/call Different byte #99 of 100 Different byte #99 of 100 System.CompareByte: 43 ns/call System.CompareByte: 53 ns/call CompareByteGeneric: 20 ns/call CompareByteGeneric: 24 ns/call CompareByteGenericV2: 10 ns/call CompareByteGenericV2: 15 ns/call Different byte #999 of 1000 Different byte #999 of 1000 System.CompareByte: 288 ns/call System.CompareByte: 163 ns/call CompareByteGeneric: 87 ns/call CompareByteGeneric: 199 ns/call CompareByteGenericV2: 51 ns/call CompareByteGenericV2: 93 ns/call ```
issue