[x86] Inefficiency fixes to OptPass1TEST
Summary
This merge request fixes some inefficiencies that were unearthed by !547 (merged) - occasionally, the TEST/JNE/TEST/JNE merged
optimisation would depend on the second jump not yet being optimised due to the sequence being of the form test $x,%reg/ref; jne .lbl1; test $y,%reg/ref; jne .lbl1; jmp .lbl2; .lbl1:
- if the second jump is optimised first, or is already in an optimised form, the entire sequence would not be optimised.
Therefore, besides some other minor improvements in the method, the following variant is now optimised: test $x,%reg/ref; jne .lbl1; test $y,%reg/ref; je .lbl2; .lbl1:
becomes test $(x or y),%reg/ref; je .lbl2; .lbl1:
System
- Processor architecture: i386, x86_64
What is the current bug behavior?
N/A
What is the behavior after applying this patch?
Some inefficiencies in the x86 peephole optimizer have been resolved.
Relevant logs and/or screenshots
Besides the cases uncovered in !547 (merged), there are a handful of improvements that are independent of that merge request. For example, in the ncal
unit (x86_64-win64, -O4), before:
...
.Lj1191:
leaq U_$GLOBALS_$$_CURRENT_SETTINGS(%rip),%rax
testb $64,89(%rax)
jne .Lj1195
testb $128,89(%rax)
je .Lj1189
.Lj1195:
...
After:
...
.Lj1191:
testb $192,U_$GLOBALS_$$_CURRENT_SETTINGS+89(%rip)
je .Lj1189
...
A lot of similiar sequences appear in the compiler source.