Improved CompareWord: generic, i386, x86-64.
I tried to improve generic and i386
versions of CompareWord
and add an x86-64
version. Pros:
-
They are faster. For me.
Sometimes.Now always. -
Did you know that existing
i386
implementation does not adhere to the documentation and returns ≷0 instead of documented ±1, while generic version makes a special effort for the latter?
Possible cons:
- Unlike existing
i386
version, I ignored the case when source pointers are misaligned by 1 byte. I heard that starting from Nehalem, it hardly matters (and on Core 2, we’re talking about a measly ten percent), but if you have another information, I can try to do something clever, at the cost of code size...
Generic version ignores it as well, though.
-
I’m still not sure that pointers can be aligned the way I align them in generic
CompareWord
(are non-flat pointers a thing?), but you have adopted a similar change inCompareByte
, so it’s probably okay? -
Didn’t test on Linux.
-
Don’t have an example that speeds up by 20% as with
CompareByte
. But I can provide a synthetic benchmark:
My results.
On i386
, System.CompareWord
is its current implementation.
On x86-64
, System.CompareWord
and CompareWordGeneric
are the same (especially if compiled with -O2
as I did), close your eyes to the anomaly on #47/48.
i386 x86-64
CompareWordGeneric: 352 b CompareWordGeneric: 320 b
CompareWordGenericV2: 208 b CompareWordGenericV2: 224 b
Different byte #1 of 2 Different byte #1 of 2
System.CompareWord: 2.2 ns/call System.CompareWord: 2.9 ns/call
CompareWordGeneric: 3.7 ns/call CompareWordGeneric: 3.1 ns/call
CompareWordGenericV2: 3.4 ns/call CompareWordGenericV2: 2.6 ns/call
CompareWordAsm: 2.3 ns/call CompareWordAsm: 3.0 ns/call
Different byte #7 of 8 Different byte #7 of 8
System.CompareWord: 4.6 ns/call System.CompareWord: 6.5 ns/call
CompareWordGeneric: 6.5 ns/call CompareWordGeneric: 6.1 ns/call
CompareWordGenericV2: 5.3 ns/call CompareWordGenericV2: 3.4 ns/call
CompareWordAsm: 4.3 ns/call CompareWordAsm: 2.7 ns/call
Different byte #13 of 14 Different byte #13 of 14
System.CompareWord: 5.7 ns/call System.CompareWord: 10 ns/call
CompareWordGeneric: 10 ns/call CompareWordGeneric: 7.7 ns/call
CompareWordGenericV2: 7.8 ns/call CompareWordGenericV2: 4.7 ns/call
CompareWordAsm: 4.6 ns/call CompareWordAsm: 2.8 ns/call
Different byte #15 of 16 Different byte #15 of 16
System.CompareWord: 6.0 ns/call System.CompareWord: 11 ns/call
CompareWordGeneric: 12 ns/call CompareWordGeneric: 8.2 ns/call
CompareWordGenericV2: 7.9 ns/call CompareWordGenericV2: 5.1 ns/call
CompareWordAsm: 5.6 ns/call CompareWordAsm: 2.2 ns/call
Different byte #17 of 18 Different byte #17 of 18
System.CompareWord: 6.4 ns/call System.CompareWord: 12 ns/call
CompareWordGeneric: 13 ns/call CompareWordGeneric: 8.8 ns/call
CompareWordGenericV2: 8.0 ns/call CompareWordGenericV2: 5.7 ns/call
CompareWordAsm: 4.9 ns/call CompareWordAsm: 3.5 ns/call
Different byte #47 of 48 Different byte #47 of 48
System.CompareWord: 13 ns/call System.CompareWord: 26 ns/call
CompareWordGeneric: 22 ns/call CompareWordGeneric: 17 ns/call
CompareWordGenericV2: 12 ns/call CompareWordGenericV2: 7.9 ns/call
CompareWordAsm: 7.6 ns/call CompareWordAsm: 3.7 ns/call
Different byte #1 of 100 Different byte #1 of 100
System.CompareWord: 15 ns/call System.CompareWord: 3.9 ns/call
CompareWordGeneric: 5.6 ns/call CompareWordGeneric: 4.1 ns/call
CompareWordGenericV2: 5.2 ns/call CompareWordGenericV2: 3.3 ns/call
CompareWordAsm: 2.6 ns/call CompareWordAsm: 2.2 ns/call
Different byte #99 of 100 Different byte #99 of 100
System.CompareWord: 43 ns/call System.CompareWord: 16 ns/call
CompareWordGeneric: 46 ns/call CompareWordGeneric: 16 ns/call
CompareWordGenericV2: 18 ns/call CompareWordGenericV2: 8.8 ns/call
CompareWordAsm: 12 ns/call CompareWordAsm: 7.0 ns/call
Different byte #999 of 1000 Different byte #999 of 1000
System.CompareWord: 153 ns/call System.CompareWord: 79 ns/call
CompareWordGeneric: 471 ns/call CompareWordGeneric: 76 ns/call
CompareWordGenericV2: 136 ns/call CompareWordGenericV2: 48 ns/call
CompareWordAsm: 81 ns/call CompareWordAsm: 36 ns/call