Remove / disable platform-specific CompareByte on i386 and x86-64.
For me, #40120 (closed) made generic CompareByte better (sometimes a lot) than its implementations for i386 and x86-64, and the effect and its reasons are more pronounced than in #40119 (closed) which just uses REP SCAS: i386.inc:CompareByte with len > 57 uses three REP CMPS paying the startup cost of REP CMPS thrice (especially bad if there is a difference in first bytes, which is often the case), and x86_64.inc:CompareByte does just a bytewise loop, efficient for tiny arrays but slowing down as the input size grows, all the way close to sizeof(PtrUint) / sizeof(byte)×.
See #40120 (closed) itself for the benchmark.
Maybe disable them unless someone comes up with a better solution?