Skip to content

[AArch64] "x and (not y)" now uses BIC, and similar for "or not" and "xor not"

Summary

This merge request takes advantage of ARM-64's unique logical instructions to simplify compound expressions, particularly BIC (and not), ORN (or not) and EON (xor not). The code and principles are partially based on a similar merge request for x86, !305 (merged).

A number of new tests have also been introduced that test xor not (which doesn't appear anywhere in the packages, compiler or RTL) and also the correctness of zero-extending operations that are 8-bit or 16-bit, since AArch64 seems to have problems when the out-of-range bits of the result are not all 0s (without the additional line of code in the commit to zero-extend the result, teontest1 and teontest2 fail).

System

  • Processor architecture: AArch64

What is the current bug behavior?

N/A

What is the behavior after applying this patch?

Improvements should be made in code generation in many situations where expressions of the form x and (not y), x or (not y) and a xor (not y) are concerned.

Relevant logs and/or screenshots

A large number of units receive improvements. A simple example from the Align method from the System unit (aarch64-linux, -O4) - before:

	...
.Lc332:
	sub	x2,x1,#1
	add	x3,x2,x0
	and	x0,x2,x1
	cbnz	x0,.Lj373
	mvn	x0,x2
	and	x0,x3,x0
	b	.Lj374
.Lj373:
	udiv	x2,x3,x1
	cbnz	x1,.Lj375
	bl	FPC_DIVBYZERO
	...

After, the MVN instruction is removed and the AND changed to BIC:

	...
.Lc332:
	sub	x2,x1,#1
	add	x3,x2,x0
	and	x0,x2,x1
	cbnz	x0,.Lj373
	bic	x0,x3,x2
	b	.Lj374
.Lj373:
	udiv	x2,x3,x1
	cbnz	x1,.Lj375
	bl	FPC_DIVBYZERO
	...

Later in the system unit, an and not operation with byte-sized operands is simplified (in his case, the EOR operation stands in for MVN)- before:

	...
.Lj1154:
	sub	x3,x3,#1
	ldrb	w4,[x1, x3]
	eor	w4,w4,#255
	ldrb	w5,[x0, x3]
	and	w4,w5,w4
	strb	w4,[x2, x3]
	...

After:

	...
.Lj1154:
	sub	x3,x3,#1
	ldrb	w5,[x1, x3]
	ldrb	w4,[x0, x3]
	bic	w4,w4,w5
	strb	w4,[x2, x3]
	...

In the md5 unit (even though it gets its own assembly version in !523), many or not operations appear - before:

	...
	movz	w3,#22117
	movk	w3,#50348,lsl #16
	add	w3,w4,w3
	ror	w3,w3,#9
	add	w3,w2,w3
	mvn	w4,w1
	orr	w4,w3,w4
	eor	w4,w2,w4
	add	w4,w0,w4
	ldr	w0,[sp]
	add	w4,w0,w4
	movz	w0,#8772
	movk	w0,#62505,lsl #16
	...

After (some register allocations are different in places as well):

	...
	movz	w3,#22117
	movk	w3,#50348,lsl #16
	add	w3,w4,w3
	ror	w3,w3,#9
	add	w3,w2,w3
	orn	w4,w3,w1
	eor	w4,w2,w4
	add	w4,w0,w4
	ldr	w0,[sp]
	add	w4,w0,w4
	movz	w0,#8772
	movk	w0,#62505,lsl #16
	...

Merge request reports