x86_64.inc: shorten Interlocked*, perform macro-fused test+jz in Index* early.
Well, I tried to apply this new hotness to x86_64.inc
and it turned out to be impossible because x86_64.inc
belongs to system
and has to be edible by the bootstrap compiler. But in the meantime I noticed two more things about x86_64.inc
:
-
Certain
Interlocked
s contain strange register-registerxchg
s foldable into surrounding instructions without a trace. -
In
IndexByte / IndexWord
, it’s better not to split and manually scheduletest len, len; jz .Lnotfound
between SSE operations, but instead do the opposite: performtest+jz
at once and at the very beginning. Reasons: 1) it macro-fuses, 2)len = 0
might occasionally be a common case :P.