2-instruction 0-jump 0-cmov BSR/F on x86 based on half-documented reality.
Free Pascal documents that BSR/F
return 255 on zero input, and presently they are compiled into:
{-$optimization nopeephole}
var
x: uint32;
begin
x := random(0) + 5;
x := BsrDWord(x);
writeln(x);
end.
; x := BsrDWord(x);
mov ecx, $FF
bsr ebx, eax
cmovz ebx, ecx
Until recent changes it was even (try nopeephole
):
; x := BsrDWord(x);
bsr ebx, eax
jnz .done
mov ebx, $FF
.done:
Now, let’s look at the vendor’s documentation.
AMD:
If the second operand contains 0, the instruction sets ZF to 1 and does not change the contents of the destination register.
Intel:
If the content of the source operand is 0, the content of the destination operand is undefined.
In this seemingly unfortunate situation, we shall turn to folklore: [1], [2], [3], [4]. They say that despite wording, Intel silently implements the same behavior as AMD, which makes sense.
So, in practice, dest := BSF/R(src)
can be compiled into MOV dest, $FF; BSF/R dest, src
.
Benchmark: BsfBenchmark.pas.
My speedup: 0.9
→0.3 ns/call
.
Grains of salt:
-
None of the compilers on Godbolt do this, maybe for a reason, maybe not...
-
Folklore is discreet about ancient Intel CPUs, consult your sources.
-
I have positively no idea what’s going on in the code I changed, please recheck.
-
what happened with pipelines