-O2 cmovcc optimization on x64 causes data loss when dealing with Int64s
Summary
Optimizer uses cmovcc on x64 with 64 bit registers, causing data loss when the data is 64 bits wide.
System Information
- Operating system: Ubuntu Linux
- Processor architecture: x86-64
- Compiler version: trunk as of 2025-07-04
- Device: Computer
Steps to reproduce
Compile this code with -O2:
{$MODE OBJFPC}
program bug;
function Test(A: DWord; B: Int64): Int64;
begin
while (True) do
begin
if (B > 0) then
begin
// B is -1, so this line does not run
B := A;
end;
Result := B;
break;
end;
end;
var
V: Int64;
begin
V := Test(0, -1);
Writeln('Result: ', V, ' (should be -1)');
end.
What is the current bug behavior?
When compiled with fpc -O2 test.pas and run, this outputs 4294967295.
What is the expected (correct) behavior?
When compiled with fpc -O1 test.pas and run, this outputs -1. This is what should always happen with this code.
Cause
The compiler is generating the following assembly in -O2:
P$BUG_$$_TEST$LONGWORD$INT64$$INT64:
.Lc2:
testq %rsi,%rsi
cmovgl %edi,%esi
movq %rsi,%rax
.Lc3:
ret
Unfortunately, cmovql always clears the high 32 bits even when the condition is false, in x64 mode, if the operand size is 32 bits. This means that if the variable being conditionally assigned to is 64 bits, it gets high half clobbered even if the branch is not supposed to be taken.
This is the pseudocode for the operation, as documented by Intel:
temp := SRC
IF condition TRUE
THEN DEST := temp;
ELSE IF (OperandSize = 32 and IA-32e mode active)
THEN DEST[63:32] := 0;
FI;
Note in particular the ELSE branch.
Weirdly, this only happens with the outer loop. If you replace the code with:
function Test(A: DWord; B: Int64): Int64;
begin
if (B > 0) then
begin
// B is -1, so this line does not run
B := A;
end;
Result := B;
end;
Then the compiler generates:
P$BUG_$$_TEST$LONGWORD$INT64$$INT64:
.Lc2:
movq %rsi,%rax
testq %rsi,%rsi
jng .Lj6
andl %edi,%edi
movq %rdi,%rax
.Lj6:
.Lc3:
ret
...which works fine (and has no cmovcc).