[x86 / Refactor (mostly)] Mov2Nop 8
efficiency check and refactor for maintainability
Summary
This merge request is twofold...
- The first commit adds some code to the Pass 2 loop so the third commit works properly.
- The second commit refactors the
TryMovArith2Lea
optimisation subroutine to be more self-contained and not rely on an external state so much, e.g. the check to see if the value is in range (and is actually a value) is now performed insideTryMovArith2Lea
, along with configuring the register usage flags in `TmpUsedRegs``. - The third commit focuses on the
movl %reg1d,%reg2d; movq %reg2q.%reg3q
optimisation (namedMovlMovq2MovlMovl 1
), making it deferred in most cases under-O3
so it only runs on a second iteration of pass 2 (it will always run on-O2
and under because Pass 2 only runs once here). This has shown to improve code generation when information about the 64-bit assignment has not been lost. Logically this would fit better in a "Pass 3" stage, but the only one that exists, the post-peephole optimisation stage, should not be used for complex optimisations (although if preferred by @FPK2, it can be written this way).
System
- Processor architecture: i386, x86-64
What is the current bug behavior?
N/A
What is the behavior after applying this patch?
OptPass2MOV
is now slightly more maintainable with its nested TryMovArith2Lea
function. Some code improvements now occur in rare situations on -O3
and above; -O2
and under should see no changes in binary output (except for the modified compiler source files).
Relevant logs and/or screenshots
By deferring MovlMovq2MovlMovl 1
, another optimisation gets performed first that converts an unnecessary movzbl
into a more efficient movb
instruction in bzip2
(x86_64-win64
, -O4
) - before:
...
.Lj21:
movzbl 84(%rbx),%r9d
movzbl %sil,%edx
movl $8,%ecx
subl %edx,%ecx
shrl %cl,%r9d
movb %r9b,%al
movzbl %sil,%ecx
shlb %cl,84(%rbx)
...
After:
...
.Lj21:
movzbl 84(%rbx),%r9d
movzbl %sil,%edx
movl $8,%ecx
subl %edx,%ecx
shrl %cl,%r9d
movb %r9b,%al
movb %sil,%cl
shlb %cl,84(%rbx)
...
A similar optimisation also appears in bzip2stream
.