Improve x86_64.inc:IndexByte/Word.
I did the same with i386.inc
counterparts in !405 (merged) but x86_64.inc
versions aren’t “mine” so that would be unthematic.
-
Remove runtime adapters between ABIs, at the cost of an unencumbering amount of extra
{$ifdef}
s. This does nothing on Windows, but saves 3 instructions on Linux (3mov
+ 1add
→ 1 1-componentlea
). -
Replace unconditional
jmp
by conditionaljz
(this logically duplicates the check inside the loop, but does so “for free”; oni386
, it happens to have the required flag from precedingshr
, and onx86-64
,test+jz
should macro-fuse) and a fallthrough to.Lmatch
. This saves 2 jumps (thisjmp
+jz .Lmatch
from the loop) if the match occurs before the first alignment boundary to the right (i.e. among first 1–16 bytes, 8(.5) on average; so it might even observably improve !527 that abuses tinyIndexByte
s, but I didn’t check :D), and does not change other cases (in the sense, they perform the same amount of jumps as before).