[x86] Pass iteration and reference counting fix
Summary
This merge request fixes a couple of minor bugs in the x86 peephole optimizer:
- Fixed a bug where the
aoc_ForceNewIteration
flag was checked in Pass 2 instead of Pass 1, thereby not having any effect. - Fixed a bug in the "Mov0LblCmp0Je -> Mov0JmpLblCmp0Je" optimisation where a reference count was increased twice instead of once.
System
- Processor architecture: i386, x86_64
What is the current bug behavior?
- Some optimsed x86 programs under
-O3
and above aren't as efficient as they could be. - No observed anomalies from the reference counting bug yet, but good to nip it in the bud.
What is the behavior after applying this patch?
- Optimisation is better under
-O3
and above because theaoc_ForceNewIteration
flag is not erroneously ignored any more.
Relevant logs and/or screenshots
A fair number of files show minor changes, usually thanks to DeepMOVOpt attempting to minimise pipeline stalls. The cgobj unit shows somehing a bit more profound under x86_64-win64 -O4 though - before:
.Lj589:
movl 56(%rbp),%eax
movl %eax,40(%rsp)
movl 8(%r13),%eax
movl %eax,32(%rsp)
movzbl %r12b,%r9d
movzbl %dil,%r8d
movq %rsi,%rdx
movq %rbx,%rcx
movq (%rbx),%rax
call *704(%rax)
jmp .Lj587
After:
.Lj589:
movl 56(%rbp),%eax
movl %eax,40(%rsp)
movl 8(%r13),%eax
movl %eax,32(%rsp)
movzbl %r9b,%r9d
movzbl %r8b,%r8d
movq (%rcx),%rax
call *704(%rax)
jmp .Lj587
In this case, with an extra iteration of Pass 1 allowed to run, movq %rsi,%rdx
and ``movq %rbx,%rcx``` could be converted and removed completely since the register pairs are already identical