[x86] Streamlining of x86's OptPass1LEA routine
Summary
This merge request was originally just going to be a refactor of OptPass1LEA, since the stack pointer is now properly tracked. However there became room for overall improvements in generated code with very little loss.
The second commit is a refactor of the streamlining that helps OptPass1LEA to avoid calling expensive routines like RegModifiedBetween when it isn't necessary (i.e. when p and hp1 are adjacent).
System
- Processor architecture: i386, x86_64
What is the current bug behavior?
N/A
What is the behavior after applying this patch?
OptPass1LEA should see some speed gains under -O1 and -O2 as well as some additional optimisations; -O4 sees some too.
Tests on i386-win32 and x86_64-win64 show no regressions.
Relevant logs and/or screenshots
In the System unit under -O4, an inefficiency was fixed - before:
...
.Lj3148:
leaq (%rbx,%rbx,1),%rax
leaq 6(%rax),%rdx
leaq 32(%rsp),%rcx
call SYSTEM_$$_GETMEM$POINTER$QWORD
...
After:
...
.Lj3148:
leaq 6(%rbx,%rbx,1),%rdx
leaq 32(%rsp),%rcx
call SYSTEM_$$_GETMEM$POINTER$QWORD
...
These little lea/lea optimisations are quite common. There are also some unexpected improvements - for example, before:
.Lj7043:
...
shlq $1,%rax
leaq 32(%rax),%rcx
...
After:
.Lj7043:
...
leaq 32(%rax,%rax,1),%rcx
...
Under -O2, the System unit gains an optimisation with two instructions that have a third instruction between them - before:
.Lj6565:
movq 32(%rsp),%rax
leaq 1(%rsi),%rdx
movslq %edi,%rdi
addq %rdi,%rdx
movq $0,(%rax,%rdx,8)
...
After:
.Lj6565:
movq 32(%rsp),%rax
movslq %edi,%rdi
leaq 1(%rsi,%rdi),%rdx ; <-- The LEA and the ADD (it might have been a LEA at this point) are merged
movq $0,(%rax,%rdx,8)
...