[x86] POPCNT and extraneous MOV optimisations
Summary
This merge request adds some optimisations for POPCNT and LZCNT instructions and ones closely related:
- for "popcnt %reg1,%reg2; test %reg2,%reg2", the test instruction gets removed (it also works for "test %reg1,%reg1"). Similarly for "lzcnt %reg1,%reg2; test %reg2,%reg2" (although not "test %reg1,%reg1" this time). This is a simple extension of PostPeepholeOptTestOr.
- At the end of OptPass1MOV there is now a 'backward optimisation" that looks for "func (oper),%reg1; mov %reg1,%reg2; (dealloc %reg1)" and changes it to "func (oper),%reg2". In this instance 'func' is any operation other than CMOV (because it might not write to the destination) with Rop1 and Wop2 flags. It was originally designed to optimise POPCNT assignments, but it also optimises things like "cvtsd2si %xmm0,%rax; movq %rax,%rcx".
System
- Processor architecture: i386, x86_64
What is the current bug behavior?
N/A
What is the behavior after applying this patch?
- Some sequences of code that use POPCNT and LZCNT are now more efficient.
- Some more extraneous MOV instructions are removed.
Additional notes
- Because the POPCNT/TEST optimisation is not triggered anywhere in the compiler, RTL or packages, a couple of new tests have been introduced to evaluate the optimisations, namely "tests/test/opt/tpopcnt1.pp" and "tests/test/opt/tpopcnt2.pp"
Relevant logs and/or screenshots
The Variants unit under "-CpCOREAVX -OpCOREAVX -CfAVX" receives the most improvement with many optimisations like the following - before:
vdivsd %xmm0,%xmm6,%xmm0
vmulsd %xmm0,%xmm8,%xmm0
vcvtsd2si %xmm0,%rax
movq %rax,%rbx
jmp .Lj689
.p2align 4,,10
.p2align 3
.Lj708:
After:
vdivsd %xmm0,%xmm6,%xmm0
vmulsd %xmm0,%xmm8,%xmm0
vcvtsd2si %xmm0,%rbx
jmp .Lj689
.p2align 4,,10
.p2align 3
.Lj708: