[x86] Bad code generated by TEST/JNE/TEST/JNE optimisation under -O3
Summary
There is a subtle bug in the TEST/JNE/TEST/JNE
peephole optimisation that changes the order of conditions and can result in the wrong branch being taken. It is possibly causing subtle failures on projects built under -O3 and above and at least one case of bad code generation appears in the compiler itself.
System Information
- Processor architecture: i386, x86_64
Steps to reproduce
Compile the following test project under -O3 -Sg (TestFunc
is where the bad code gets generated).`
program twblockf;
function TestFunc(Input1, Input2: Cardinal): Cardinal;
label
PrintNegative, PrintPositive;
begin
{ I know gotos are ugly, but I need to force a specific assembly genereation }
if (Input1 and $1) <> 0 then
goto PrintPositive;
if Input2 > 3 then
goto PrintNegative;
if (Input1 and $2) <> 0 then
goto PrintPositive;
PrintNegative:
{ Use arithmetic to prevent unwanted JMP/RET optimisations }
Exit(Input2 or $80000000);
PrintPositive:
{ Use arithmetic to prevent unwanted JMP/RET optimisations }
Exit(Input1 + Input2);
end;
const
Inputs1: array[0..4] of Cardinal = (1, 2, 2, 3, 4);
Inputs2: array[0..4] of Cardinal = (4, 3, 5, 6, 7);
Expected: array[0..4] of Cardinal = (5, 5, $80000005, 9, $80000007);
var
X: Integer; Output: Cardinal;
begin
for X := Low(Inputs1) to High(Inputs2) do
begin
Output := TestFunc(Inputs1[X], Inputs2[X]);
if Output <> Expected[X] then
begin
WriteLn('FAIL: TestFunc(', Inputs1[X], ', ', Inputs2[X], ') returned ', Output, ' but expected ', Expected[X]);
Halt(1);
end;
end;
WriteLn('ok');
end.
The test fails with the message FAIL: TestFunc(2, 5) returned 7 but expected 2147483653
.
In essence, the bad code is generated when 3 conditions run adjacent to each other, with the 1st and 3rd testing the same register for different bits and the 2nd condition checking a different register or address. If the 1st and 3rd conditions can be combined because any of the bits being set leads to the same branch, then they are incorrectly merged when a different condition is between them (without the 2nd condition, the merging would be correct)
What is the current bug behavior?
The condition order is being violated and causing the wrong conditional branch to be taken.
What is the expected (correct) behavior?
The peephole optimizer should not be generating bad code!
Relevant logs and/or screenshots
This is TestFunc
in the above test project on x86_64-win64 under -O2 (non-jump labels removed for clarity):
.section .text.n_p$twblockf_$$_testfunc$longword$longword$$longword,"ax"
.balign 16,0x90
.globl P$TWBLOCKF_$$_TESTFUNC$LONGWORD$LONGWORD$$LONGWORD
P$TWBLOCKF_$$_TESTFUNC$LONGWORD$LONGWORD$$LONGWORD:
testb $1,%cl
jne .Lj7
cmpl $3,%edx
ja .Lj10
testb $2,%cl
jne .Lj7
.p2align 4,,10
.p2align 3
.Lj10:
movl %edx,%eax
orl $-2147483648,%eax
ret
.p2align 4,,10
.p2align 3
.Lj7:
leal (%ecx,%edx),%eax
ret
This is the same function under -O3 - due to a cascade of other optimisations, the second condition gets removed completely:
.section .text.n_p$twblockf_$$_testfunc$longword$longword$$longword,"ax"
.balign 16,0x90
.globl P$TWBLOCKF_$$_TESTFUNC$LONGWORD$LONGWORD$$LONGWORD
P$TWBLOCKF_$$_TESTFUNC$LONGWORD$LONGWORD$$LONGWORD:
testb $3,%cl
jne .Lj7
movl %edx,%eax
orl $-2147483648,%eax
ret
.p2align 4,,10
.p2align 3
.Lj7:
leal (%ecx,%edx),%eax
ret
In the compiler (this time, using i386-win32 as an example) in the psub
unit, the tcgprocinfo.set_eh_info
gets bad code generated under -O3:
.section .text.n_psub$_$tcgprocinfo_$__$$_set_eh_info,"ax"
.balign 16,0x90
.globl PSUB$_$TCGPROCINFO_$__$$_SET_EH_INFO
PSUB$_$TCGPROCINFO_$__$$_SET_EH_INFO:
pushl %ebx
movl %eax,%ebx
call PROCINFO$_$TPROCINFO_$__$$_SET_EH_INFO
testb $16,U_$SYSTEMS_$$_TARGET_INFO+60
je .Lj426
movl 124(%ebx),%eax
testl $20,%eax
jne .Lj427
.Lj426:
testl $33554432,124(%ebx)
je .Lj434
.Lj427:
movl $_$PSUB$_Ld25,%eax
call SYMTABLE_$$_SEARCH_SYSTEM_PROC$TIDSTRING$$TPROCDEF
movl %eax,%edx
movl 72(%ebx),%eax
call SYMDEF$_$TPROCDEF_$__$$_SETPERSONALITY$TPROCDEF
.Lj434:
popl %ebx
ret
For clarity, this is the routine in question:
procedure tcgprocinfo.set_eh_info;
begin
inherited;
if (tf_use_psabieh in target_info.flags) and
((pi_uses_exceptions in flags) or
((cs_implicit_exceptions in current_settings.moduleswitches) and
(pi_needs_implicit_finally in flags))) or
(pi_has_except_table_data in flags) then
procdef.personality:=search_system_proc('_FPC_PSABIEH_PERSONALITY_V0');
end;
The testl $20,%eax
instruction was formed by erroneously merging (pi_uses_exceptions in flags)
and (pi_needs_implicit_finally in flags)
, and the intermediate condition (cs_implicit_exceptions in current_settings.moduleswitches)
gets incorrectly optimised out and is never checked.
Possible fixes
The TEST/JNE/TEST/JNE
optimisation must not use GetNextInstructionUsingReg
to search for the second TEST/JNE
pair. Code between them must be executed if the first TEST/JNE
pair doesn't branch.