[AArch64] "x and (not y)" now uses BIC, and similar for "or not" and "xor not"
Summary
This merge request takes advantage of ARM-64's unique logical instructions to simplify compound expressions, particularly BIC
(and not), ORN
(or not) and EON
(xor not). The code and principles are partially based on a similar merge request for x86, !305 (merged).
A number of new tests have also been introduced that test xor not
(which doesn't appear anywhere in the packages, compiler or RTL) and also the correctness of zero-extending operations that are 8-bit or 16-bit, since AArch64 seems to have problems when the out-of-range bits of the result are not all 0s (without the additional line of code in the commit to zero-extend the result, teontest1
and teontest2
fail).
System
- Processor architecture: AArch64
What is the current bug behavior?
N/A
What is the behavior after applying this patch?
Improvements should be made in code generation in many situations where expressions of the form x and (not y)
, x or (not y)
and a xor (not y)
are concerned.
Relevant logs and/or screenshots
A large number of units receive improvements. A simple example from the Align
method from the System
unit (aarch64-linux, -O4) - before:
...
.Lc332:
sub x2,x1,#1
add x3,x2,x0
and x0,x2,x1
cbnz x0,.Lj373
mvn x0,x2
and x0,x3,x0
b .Lj374
.Lj373:
udiv x2,x3,x1
cbnz x1,.Lj375
bl FPC_DIVBYZERO
...
After, the MVN
instruction is removed and the AND
changed to BIC
:
...
.Lc332:
sub x2,x1,#1
add x3,x2,x0
and x0,x2,x1
cbnz x0,.Lj373
bic x0,x3,x2
b .Lj374
.Lj373:
udiv x2,x3,x1
cbnz x1,.Lj375
bl FPC_DIVBYZERO
...
Later in the system
unit, an and not
operation with byte-sized operands is simplified (in his case, the EOR
operation stands in for MVN
)- before:
...
.Lj1154:
sub x3,x3,#1
ldrb w4,[x1, x3]
eor w4,w4,#255
ldrb w5,[x0, x3]
and w4,w5,w4
strb w4,[x2, x3]
...
After:
...
.Lj1154:
sub x3,x3,#1
ldrb w5,[x1, x3]
ldrb w4,[x0, x3]
bic w4,w4,w5
strb w4,[x2, x3]
...
In the md5
unit (even though it gets its own assembly version in !523), many or not
operations appear - before:
...
movz w3,#22117
movk w3,#50348,lsl #16
add w3,w4,w3
ror w3,w3,#9
add w3,w2,w3
mvn w4,w1
orr w4,w3,w4
eor w4,w2,w4
add w4,w0,w4
ldr w0,[sp]
add w4,w0,w4
movz w0,#8772
movk w0,#62505,lsl #16
...
After (some register allocations are different in places as well):
...
movz w3,#22117
movk w3,#50348,lsl #16
add w3,w4,w3
ror w3,w3,#9
add w3,w2,w3
orn w4,w3,w1
eor w4,w2,w4
add w4,w0,w4
ldr w0,[sp]
add w4,w0,w4
movz w0,#8772
movk w0,#62505,lsl #16
...