x86: Missed logic with CMP and MOV optimisations
This merge request fixes a few minor issues with peephole optimisations:
- cmpq (const),%reg instructions don't check to see if any MOV instructions following it can be moved to before it because the "CmpJe2NegJo" optimisation exits the entire procedure if the instruction size is S_Q instead of moving onto the next optimisation (which is to move MOV instructions).
- The "J(c)Mov0JmpMov1 -> Set(c)" optimisations didn't call AllocRegBetween to properly track the register used by SETcc. There didn't seem to be any adverse effects from this oversight, but incorrect register tracking may cause future problems with optimisations that depend closely on it.
- "movq $0,%regq" was not optimised to "movl $0,%regl" if the FLAGS register is in use (it gets optimised to xorl %regl,%regl otherwise) because the size reduction optimisation only checks values between 1 and $FFFFFFFF. This generally gets overshadowed by the above fixed CMP/MOV optimisation though. Also, because this optimisation is only specific to x86_64 due to the 64-bit register sizes, the code is now disabled for i386 and i8086 with no loss.
Under -O4 in the System unit for x86_64-win64 - trunk:
.section .text.n_system_$$_pos$shortstring$char$int64$$int64,"ax"
.balign 16,0x90
.globl SYSTEM_$$_POS$SHORTSTRING$CHAR$INT64$$INT64
SYSTEM_$$_POS$SHORTSTRING$CHAR$INT64$$INT64:
cmpb $1,(%rcx)
jne .Lj1460
cmpb 1(%rcx),%dl
jne .Lj1460
cmpq $1,%r8
# Peephole Optimization: J(c)Mov1JmpMov0 -> Set(~c) (partial)
movq $0,%rax
seteb %al
ret
.p2align 4,,10
.p2align 3
.Lj1460:
# Peephole Optimization: xorq %rax,%rax -> xorl %eax,%eax (removes REX prefix)
xorl %eax,%eax
ret
Merge request:
.section .text.n_system_$$_pos$shortstring$char$int64$$int64,"ax"
.balign 16,0x90
.globl SYSTEM_$$_POS$SHORTSTRING$CHAR$INT64$$INT64
SYSTEM_$$_POS$SHORTSTRING$CHAR$INT64$$INT64:
cmpb $1,(%rcx)
jne .Lj1460
cmpb 1(%rcx),%dl
jne .Lj1460
# Peephole Optimization: Swapped cmp and mov instructions to improve optimisation potential
# Peephole Optimization: xorq %rax,%rax -> xorl %eax,%eax (removes REX prefix)
xorl %eax,%eax
cmpq $1,%r8
# Peephole Optimization: J(c)Mov1JmpMov0 -> Set(~c) (partial)
seteb %al
ret
.p2align 4,,10
.p2align 3
.Lj1460:
# Peephole Optimization: xorq %rax,%rax -> xorl %eax,%eax (removes REX prefix)
xorl %eax,%eax
ret
Criteria
Confirm correct compilation, no regressions and slightly improved generated code, especially if a condition is written to a Boolean type that's longer than a byte.
Future Development
Going by the sample procedure alone, there seems to be potential to optimise the conditional jumps by placing xorl %eax,%eax before them, since zeroing that register is common to all the branches.