x86: Optimisation simplifications
Summary
This merge request simplifies a couple of post-peephole optimisations on x86-64 platforms:
- The MOVZX optimisation that changes BQ and WQ sizes to BL and WL respectively has been removed, because, while legal, the compiler never generates these sizes.
- The xorq->xorl optimisation no longer checks to see if the register is RAX, RCX, RDX, RBX, RSI, RDI, RBP or RSP because Silvermont processors only recognise the 32-bit version as dependency-breaking, not 64-bit, and so it's beneficial for all registers, not just those that permit the removal of the REX prefix.
System
- Processor architecture: x86-64 (not i386 because the blocks of code in question are surrounded by "{$ifdef x86_64}" directives.
What is the current bug behavior?
N/A
What is the behavior after applying this patch?
- Confirm correct compliation and no test regressions.
- Confirm that removal of MOVZX optimisation has zero effect on generated code because the required input sizes are never generated.
- Confirm changes of xorq %reg,%reg to xorl %reg,%reg for registers %r8-%r15 (debug comment is slightly different now too, saying "32-bit register recommended when zeroing 64-bit counterpart" rather than "removes REX prefix").
Relevant logs and/or screenshots
Example from Sysutils under -O4 - before:
...
.Lj2412:
movq $10,-48(%rbp)
leaq -48(%rbp),%r9
movq -32(%rbp),%rcx
leaq RTTI_$SYSUTILS_$$_TSTRINGARRAY(%rip),%rdx
# Peephole Optimization: movq $1,%r8 -> movl $1,%r8d (immediate can be represented with just 32 bits)
movl $1,%r8d
call fpc_dynarray_setlength
xorq %r12,%r12
xorq %r13,%r13
movq %rbp,%rcx
...
After:
...
.Lj2412:
movq $10,-48(%rbp)
leaq -48(%rbp),%r9
movq -32(%rbp),%rcx
leaq RTTI_$SYSUTILS_$$_TSTRINGARRAY(%rip),%rdx
# Peephole Optimization: movq $1,%r8 -> movl $1,%r8d (immediate can be represented with just 32 bits)
movl $1,%r8d
call fpc_dynarray_setlength
# Peephole Optimization: xorq %r12,%r12 -> xorl %r12d,%r12d (32-bit register recommended when zeroing 64-bit counterpart)
xorl %r12d,%r12d
# Peephole Optimization: xorq %r13,%r13 -> xorl %r13d,%r13d (32-bit register recommended when zeroing 64-bit counterpart)
xorl %r13d,%r13d
movq %rbp,%rcx
...
Edited by J. Gareth "Kit" Moreton