Remove / disable platform-specific CompareByte on i386 and x86-64.
For me, #40120 (closed) made generic CompareByte
better (sometimes a lot) than its implementations for i386
and x86-64
, and the effect and its reasons are more pronounced than in #40119 (closed) which just uses REP SCAS
: i386.inc:CompareByte
with len > 57
uses three REP CMPS
paying the startup cost of REP CMPS
thrice (especially bad if there is a difference in first bytes, which is often the case), and x86_64.inc:CompareByte
does just a bytewise loop, efficient for tiny arrays but slowing down as the input size grows, all the way close to sizeof(PtrUint) / sizeof(byte)
×.
See #40120 (closed) itself for the benchmark.
Maybe disable them unless someone comes up with a better solution?