[x86 / Bug Fix] Fixed bug in TryCmpCMovOpts peephole optimization
Summary
This merge request fixes an oversight in the TryCmpCMovOpts
routine (called by OptPass2CMP
and OptPass2TEST
) that caused bad code to be generated in a few circumstances.
System
- Operating system: Windows (possibly Linux as well)
- Processor architecture: i386 (possibly x86_64 as well)
What is the current bug behavior?
In some situations, the TryCmpCMovOpts
would not stop where it was supposed to and end up erroneously moving unrelated MOV
instructions and changing their destination registers.
What is the behavior after applying this patch?
The above circumstance should no longer happen and complementary CMOV
pairs properly optimised.
Relevant logs and/or screenshots
The compiler bug notably manifested in the hlcgcpu
unit, specifically the g_intf_wrapper
procedure, although it appeared in other places too. After fixing and re-enabling the optimisation, the following improvement is made (i386-win32, -O4 -CpCOREAVX2) - before:
...
movl $8,%ecx
cmpb $2,33(%eax)
# Peephole Optimization: CMOV Block (Branching type)
cmoval %ecx,%eax
movl $4,%ecx
# Peephole Optimization: CMOV Block (Simple type)
cmovnal %ecx,%eax
pushl $17039367
# Peephole Optimization: Lea2AddBase done
addl %edx,%eax
pushl %eax
pushl TC_$CGUTILS_$$_CTEMPPOSINVALID
pushl $4
pushl $0
movl -12(%ebx),%eax
movl U_$SYMDEF_$$_VOIDSTACKPOINTERTYPE,%ecx
...
After:
...
movl $8,%ecx
cmpb $2,33(%eax)
movl $4,%eax
# Peephole Optimization: CMOV Block (Branching type)
# Peephole Optimization: CMovCMov2MovCMov 1
cmoval %ecx,%eax
# Peephole Optimization: CMOV Block (Simple type)
pushl $17039367
# Peephole Optimization: Lea2AddBase done
addl %edx,%eax
pushl %eax
pushl TC_$CGUTILS_$$_CTEMPPOSINVALID
pushl $4
pushl $0
movl -12(%ebx),%eax
movl U_$SYMDEF_$$_VOIDSTACKPOINTERTYPE,%ecx
...
When it was bugged, the movl U_$SYMDEF_$$_VOIDSTACKPOINTERTYPE,%ecx
instruction and a couple of others were moved to before the comparison, and changed the destination register to %eax
. Notably, the bug fix also ensures the MOV
instruction is placed after the triggering CMP
or TEST
instruction if the new destination register appears in it.
Improvements also appear elsewhere. In the adler
unit for example (i386-win32, -O4 -CpCOREAVX2) - before:
.Lj7:
cmpl $3854,%ecx
cmovbw %cx,%bx
movw $3854,%dx
cmovnbw %dx,%bx
movswl %bx,%edx
subl %edx,%ecx
jmp .Lj14
After (note that this time the MOV
is placed before the comparison because it doesn't share any registers with it, thus minimising pipeline stalls):
.Lj7:
movw $3854,%bx
cmpl $3854,%ecx
cmovbw %cx,%bx
movswl %bx,%edx
subl %edx,%ecx
jmp .Lj14
The same optimisation is made under x86_64-win64 as well:
...
.Lj7:
cmpl $3854,%r8d
cmovbw %r8w,%r11w
movw $3854,%dx
cmovnbw %dx,%r11w
movswl %r11w,%edx
subl %edx,%r8d
jmp .Lj14
...
After:
...
.Lj7:
movw $3854,%r11w
cmpl $3854,%r8d
cmovbw %r8w,%r11w
movswl %r11w,%edx
subl %edx,%r8d
jmp .Lj14
...