Skip to content

Improve generic CompareByte.

Generic CompareByte can be improved in the way resembling !360 (merged) (things like PtrUint(ptr) := PtrUint(ptr) div 4 * 4 aren’t necessary for that, but they give slightly better code as well and I hope they are valid everywhere...).

Patch: CompareByte.patch.

It makes platform-specific implementations for i386 and x86_64 worse than generic for me, as i386 uses bytewise loop and three REP CMPs (one would already be bad enough), and x86_64 uses bytewise loop exclusively. So, unless someone comes up with SSE version, I propose to also remove both of them.

Benchmark: CompareByte.pas.

My results. Note (!) where second byte already differs, but i386 version sees len > 57 and issues three REP CMPs.

x86-64/win64                                  i386/win32

CompareByteGeneric: 288 b                     CompareByteGeneric: 304 b
CompareByteGenericV2: 208 b                   CompareByteGenericV2: 176 b

Different byte #0 of 1                        Different byte #0 of 1
System.CompareByte:        1.8 ns/call        System.CompareByte:        2.2 ns/call
CompareByteGeneric:        3.0 ns/call        CompareByteGeneric:        2.9 ns/call
CompareByteGenericV2:      1.9 ns/call        CompareByteGenericV2:      3.0 ns/call

Different byte #7 of 8                        Different byte #7 of 8
System.CompareByte:        4.9 ns/call        System.CompareByte:        5.3 ns/call
CompareByteGeneric:        11 ns/call         CompareByteGeneric:        7.8 ns/call
CompareByteGenericV2:      4.4 ns/call        CompareByteGenericV2:      6.3 ns/call

Different byte #15 of 16                      Different byte #15 of 16
System.CompareByte:        7.8 ns/call        System.CompareByte:        8.9 ns/call
CompareByteGeneric:        19 ns/call         CompareByteGeneric:        9.4 ns/call
CompareByteGenericV2:      6.2 ns/call        CompareByteGenericV2:      8.0 ns/call

Different byte #23 of 24                      Different byte #23 of 24
System.CompareByte:        9.9 ns/call        System.CompareByte:        11 ns/call
CompareByteGeneric:        26 ns/call         CompareByteGeneric:        11 ns/call
CompareByteGenericV2:      6.4 ns/call        CompareByteGenericV2:      9.5 ns/call

Different byte #1 of 100                      Different byte #1 of 100
System.CompareByte:        1.8 ns/call        System.CompareByte:        40 ns/call   (!)
CompareByteGeneric:        5.2 ns/call        CompareByteGeneric:        5.8 ns/call
CompareByteGenericV2:      2.9 ns/call        CompareByteGenericV2:      4.3 ns/call

Different byte #99 of 100                     Different byte #99 of 100
System.CompareByte:        43 ns/call         System.CompareByte:        53 ns/call
CompareByteGeneric:        20 ns/call         CompareByteGeneric:        24 ns/call
CompareByteGenericV2:      10 ns/call         CompareByteGenericV2:      15 ns/call

Different byte #999 of 1000                   Different byte #999 of 1000
System.CompareByte:        288 ns/call        System.CompareByte:        163 ns/call
CompareByteGeneric:        87 ns/call         CompareByteGeneric:        199 ns/call
CompareByteGenericV2:      51 ns/call         CompareByteGenericV2:      93 ns/call
Edited by Rika
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information