Skip to content

x86_64.inc: shorten Interlocked*, perform macro-fused test+jz in Index* early.

Rika requested to merge runewalsh/source:64ilibw into main

Well, I tried to apply this new hotness to x86_64.inc and it turned out to be impossible because x86_64.inc belongs to system and has to be edible by the bootstrap compiler. But in the meantime I noticed two more things about x86_64.inc:

  • Certain Interlockeds contain strange register-register xchgs foldable into surrounding instructions without a trace.

  • In IndexByte / IndexWord, it’s better not to split and manually schedule test len, len; jz .Lnotfound between SSE operations, but instead do the opposite: perform test+jz at once and at the very beginning. Reasons: 1) it macro-fuses, 2) len = 0 might occasionally be a common case :P.

Merge request reports