-O2 cmovcc optimization on x64 causes data loss when dealing with Int64s

Summary

Optimizer uses cmovcc on x64 with 64 bit registers, causing data loss when the data is 64 bits wide.

System Information

  • Operating system: Ubuntu Linux
  • Processor architecture: x86-64
  • Compiler version: trunk as of 2025-07-04
  • Device: Computer

Steps to reproduce

Compile this code with -O2:

{$MODE OBJFPC}
program bug;

function Test(A: DWord; B: Int64): Int64;
begin
   while (True) do
   begin
      if (B > 0) then
      begin
         // B is -1, so this line does not run
         B := A;
      end;
      Result := B;
      break;
   end;
end;

var
   V: Int64;
begin
   V := Test(0, -1);
   Writeln('Result: ', V, ' (should be -1)');
end.

What is the current bug behavior?

When compiled with fpc -O2 test.pas and run, this outputs 4294967295.

What is the expected (correct) behavior?

When compiled with fpc -O1 test.pas and run, this outputs -1. This is what should always happen with this code.

Cause

The compiler is generating the following assembly in -O2:

P$BUG_$$_TEST$LONGWORD$INT64$$INT64:
.Lc2:
        testq   %rsi,%rsi
        cmovgl  %edi,%esi
        movq    %rsi,%rax
.Lc3:
        ret

Unfortunately, cmovql always clears the high 32 bits even when the condition is false, in x64 mode, if the operand size is 32 bits. This means that if the variable being conditionally assigned to is 64 bits, it gets high half clobbered even if the branch is not supposed to be taken.

This is the pseudocode for the operation, as documented by Intel:

temp := SRC
IF condition TRUE
    THEN DEST := temp;
ELSE IF (OperandSize = 32 and IA-32e mode active)
    THEN DEST[63:32] := 0;
FI;

Note in particular the ELSE branch.

Weirdly, this only happens with the outer loop. If you replace the code with:

function Test(A: DWord; B: Int64): Int64;
begin
   if (B > 0) then
   begin
      // B is -1, so this line does not run
      B := A;
   end;
   Result := B;
end;

Then the compiler generates:

P$BUG_$$_TEST$LONGWORD$INT64$$INT64:
.Lc2:
        movq    %rsi,%rax
        testq   %rsi,%rsi
        jng     .Lj6
        andl    %edi,%edi
        movq    %rdi,%rax
.Lj6:
.Lc3:
        ret

...which works fine (and has no cmovcc).

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information