Apply recent x86_64.inc ideas to i386.inc, and a bit on top.
I went into i386.inc
for business and noticed the following possibilities:
-
As in !509 (merged), certain
Interlocked*
have strange easily eliminablexchg
s. Also make themnostackframe
. -
As in !426 (merged),
cpu*locked
operations that checkIsMultiThread
can jump over a LOCK prefix, using a second parameter to receiveIsMultiThread
in a high-level manner. Should be easier to inline. -
BsfQWord
has unused.L1
, and also it could completely mirrorBsrDWord
but does not, and I thinkBsrQWord
approach with an additional replacement ofjmp <end>
to directret $8
is better because it then takes zero jumps in one of the two common cases (“primary” half is nonzero), one jump in the other common case (“primary” half is zero, “secondary” is nonzero), and two jumps in the rare to impossible case (input = 0), whileBsfQWord
is geared toward input = 0. -
SarInt64
can: ignore the possibility ofShift > 63
(such shifts are undefined I hope?), but rely on that x86sar cl, r32
uses only 5 lower bits ofcl
(so omit “and cl, 31
”), and also (arguably...) skip reading the lower half unless shift is indeed less than 32. -
InterlockedCompareExchange64
is not published and seemingly not used by anyone. I made a separate MR.