Skip to content

Improved CompareWord: generic, i386, x86-64.

Rika requested to merge runewalsh/source:compareword into main

I tried to improve generic and i386 versions of CompareWord and add an x86-64 version. Pros:

  1. They are faster. For me. Sometimes. Now always.

  2. Did you know that existing i386 implementation does not adhere to the documentation and returns ≷0 instead of documented ±1, while generic version makes a special effort for the latter?

Possible cons:

  1. Unlike existing i386 version, I ignored the case when source pointers are misaligned by 1 byte. I heard that starting from Nehalem, it hardly matters (and on Core 2, we’re talking about a measly ten percent), but if you have another information, I can try to do something clever, at the cost of code size...

Generic version ignores it as well, though.

  1. I’m still not sure that pointers can be aligned the way I align them in generic CompareWord (are non-flat pointers a thing?), but you have adopted a similar change in CompareByte, so it’s probably okay?

  2. Didn’t test on Linux.

  3. Don’t have an example that speeds up by 20% as with CompareByte. But I can provide a synthetic benchmark:

CompareWordBenchmark.pas

My results.
On i386, System.CompareWord is its current implementation.
On x86-64, System.CompareWord and CompareWordGeneric are the same (especially if compiled with -O2 as I did), close your eyes to the anomaly on #47/48.

i386                                          x86-64
CompareWordGeneric: 352 b                     CompareWordGeneric: 320 b
CompareWordGenericV2: 208 b                   CompareWordGenericV2: 224 b

Different byte #1 of 2                        Different byte #1 of 2
System.CompareWord:        2.2 ns/call        System.CompareWord:        2.9 ns/call
CompareWordGeneric:        3.7 ns/call        CompareWordGeneric:        3.1 ns/call
CompareWordGenericV2:      3.4 ns/call        CompareWordGenericV2:      2.6 ns/call
CompareWordAsm:            2.3 ns/call        CompareWordAsm:            3.0 ns/call

Different byte #7 of 8                        Different byte #7 of 8
System.CompareWord:        4.6 ns/call        System.CompareWord:        6.5 ns/call
CompareWordGeneric:        6.5 ns/call        CompareWordGeneric:        6.1 ns/call
CompareWordGenericV2:      5.3 ns/call        CompareWordGenericV2:      3.4 ns/call
CompareWordAsm:            4.3 ns/call        CompareWordAsm:            2.7 ns/call

Different byte #13 of 14                      Different byte #13 of 14
System.CompareWord:        5.7 ns/call        System.CompareWord:        10 ns/call
CompareWordGeneric:        10 ns/call         CompareWordGeneric:        7.7 ns/call
CompareWordGenericV2:      7.8 ns/call        CompareWordGenericV2:      4.7 ns/call
CompareWordAsm:            4.6 ns/call        CompareWordAsm:            2.8 ns/call

Different byte #15 of 16                      Different byte #15 of 16
System.CompareWord:        6.0 ns/call        System.CompareWord:        11 ns/call
CompareWordGeneric:        12 ns/call         CompareWordGeneric:        8.2 ns/call
CompareWordGenericV2:      7.9 ns/call        CompareWordGenericV2:      5.1 ns/call
CompareWordAsm:            5.6 ns/call        CompareWordAsm:            2.2 ns/call

Different byte #17 of 18                      Different byte #17 of 18
System.CompareWord:        6.4 ns/call        System.CompareWord:        12 ns/call
CompareWordGeneric:        13 ns/call         CompareWordGeneric:        8.8 ns/call
CompareWordGenericV2:      8.0 ns/call        CompareWordGenericV2:      5.7 ns/call
CompareWordAsm:            4.9 ns/call        CompareWordAsm:            3.5 ns/call

Different byte #47 of 48                      Different byte #47 of 48
System.CompareWord:        13 ns/call         System.CompareWord:        26 ns/call
CompareWordGeneric:        22 ns/call         CompareWordGeneric:        17 ns/call
CompareWordGenericV2:      12 ns/call         CompareWordGenericV2:      7.9 ns/call
CompareWordAsm:            7.6 ns/call        CompareWordAsm:            3.7 ns/call

Different byte #1 of 100                      Different byte #1 of 100
System.CompareWord:        15 ns/call         System.CompareWord:        3.9 ns/call
CompareWordGeneric:        5.6 ns/call        CompareWordGeneric:        4.1 ns/call
CompareWordGenericV2:      5.2 ns/call        CompareWordGenericV2:      3.3 ns/call
CompareWordAsm:            2.6 ns/call        CompareWordAsm:            2.2 ns/call

Different byte #99 of 100                     Different byte #99 of 100
System.CompareWord:        43 ns/call         System.CompareWord:        16 ns/call
CompareWordGeneric:        46 ns/call         CompareWordGeneric:        16 ns/call
CompareWordGenericV2:      18 ns/call         CompareWordGenericV2:      8.8 ns/call
CompareWordAsm:            12 ns/call         CompareWordAsm:            7.0 ns/call

Different byte #999 of 1000                   Different byte #999 of 1000
System.CompareWord:        153 ns/call        System.CompareWord:        79 ns/call
CompareWordGeneric:        471 ns/call        CompareWordGeneric:        76 ns/call
CompareWordGenericV2:      136 ns/call        CompareWordGenericV2:      48 ns/call
CompareWordAsm:            81 ns/call         CompareWordAsm:            36 ns/call
Edited by Rika

Merge request reports