Skip to content

Improve x86_64.inc:IndexByte/Word.

Rika requested to merge runewalsh/source:ibw-x64 into main

I did the same with i386.inc counterparts in !405 (merged) but x86_64.inc versions aren’t “mine” so that would be unthematic.

  1. Remove runtime adapters between ABIs, at the cost of an unencumbering amount of extra {$ifdef}s. This does nothing on Windows, but saves 3 instructions on Linux (3 mov + 1 add → 1 1-component lea).

  2. Replace unconditional jmp by conditional jz (this logically duplicates the check inside the loop, but does so “for free”; on i386, it happens to have the required flag from preceding shr, and on x86-64, test+jz should macro-fuse) and a fallthrough to .Lmatch. This saves 2 jumps (this jmp + jz .Lmatch from the loop) if the match occurs before the first alignment boundary to the right (i.e. among first 1–16 bytes, 8(.5) on average; so it might even observably improve !527 that abuses tiny IndexBytes, but I didn’t check :D), and does not change other cases (in the sense, they perform the same amount of jumps as before).

Merge request reports