Improve generic CompareByte.
Generic CompareByte
can be improved in the way resembling !360 (merged) (things like PtrUint(ptr) := PtrUint(ptr) div 4 * 4
aren’t necessary for that, but they give slightly better code as well and I hope they are valid everywhere...).
Patch: CompareByte.patch.
It makes platform-specific implementations for i386
and x86_64
worse than generic for me, as i386
uses bytewise loop and three REP CMP
s (one would already be bad enough), and x86_64
uses bytewise loop exclusively. So, unless someone comes up with SSE version, I propose to also remove both of them.
Benchmark: CompareByte.pas.
My results. Note (!)
where second byte already differs, but i386
version sees len > 57
and issues three REP CMP
s.
x86-64/win64 i386/win32
CompareByteGeneric: 288 b CompareByteGeneric: 304 b
CompareByteGenericV2: 208 b CompareByteGenericV2: 176 b
Different byte #0 of 1 Different byte #0 of 1
System.CompareByte: 1.8 ns/call System.CompareByte: 2.2 ns/call
CompareByteGeneric: 3.0 ns/call CompareByteGeneric: 2.9 ns/call
CompareByteGenericV2: 1.9 ns/call CompareByteGenericV2: 3.0 ns/call
Different byte #7 of 8 Different byte #7 of 8
System.CompareByte: 4.9 ns/call System.CompareByte: 5.3 ns/call
CompareByteGeneric: 11 ns/call CompareByteGeneric: 7.8 ns/call
CompareByteGenericV2: 4.4 ns/call CompareByteGenericV2: 6.3 ns/call
Different byte #15 of 16 Different byte #15 of 16
System.CompareByte: 7.8 ns/call System.CompareByte: 8.9 ns/call
CompareByteGeneric: 19 ns/call CompareByteGeneric: 9.4 ns/call
CompareByteGenericV2: 6.2 ns/call CompareByteGenericV2: 8.0 ns/call
Different byte #23 of 24 Different byte #23 of 24
System.CompareByte: 9.9 ns/call System.CompareByte: 11 ns/call
CompareByteGeneric: 26 ns/call CompareByteGeneric: 11 ns/call
CompareByteGenericV2: 6.4 ns/call CompareByteGenericV2: 9.5 ns/call
Different byte #1 of 100 Different byte #1 of 100
System.CompareByte: 1.8 ns/call System.CompareByte: 40 ns/call (!)
CompareByteGeneric: 5.2 ns/call CompareByteGeneric: 5.8 ns/call
CompareByteGenericV2: 2.9 ns/call CompareByteGenericV2: 4.3 ns/call
Different byte #99 of 100 Different byte #99 of 100
System.CompareByte: 43 ns/call System.CompareByte: 53 ns/call
CompareByteGeneric: 20 ns/call CompareByteGeneric: 24 ns/call
CompareByteGenericV2: 10 ns/call CompareByteGenericV2: 15 ns/call
Different byte #999 of 1000 Different byte #999 of 1000
System.CompareByte: 288 ns/call System.CompareByte: 163 ns/call
CompareByteGeneric: 87 ns/call CompareByteGeneric: 199 ns/call
CompareByteGenericV2: 51 ns/call CompareByteGenericV2: 93 ns/call
Edited by Rika