[AArch64] "x and (not y)" now uses BIC, and similar for "or not" and "xor not" (!524) · Merge requests · FPC / FPC / FPC Source

J. Gareth "Kit" Moreton requested to merge CuriousKit/optimisations:bic-node into main Nov 06, 2023

Summary

This merge request takes advantage of ARM-64's unique logical instructions to simplify compound expressions, particularly BIC (and not), ORN (or not) and EON (xor not). The code and principles are partially based on a similar merge request for x86, !305 (merged).

A number of new tests have also been introduced that test xor not (which doesn't appear anywhere in the packages, compiler or RTL) and also the correctness of zero-extending operations that are 8-bit or 16-bit, since AArch64 seems to have problems when the out-of-range bits of the result are not all 0s (without the additional line of code in the commit to zero-extend the result, teontest1 and teontest2 fail).

System

Processor architecture: AArch64

What is the current bug behavior?

N/A

What is the behavior after applying this patch?

Improvements should be made in code generation in many situations where expressions of the form x and (not y), x or (not y) and a xor (not y) are concerned.

Relevant logs and/or screenshots

A large number of units receive improvements. A simple example from the Align method from the System unit (aarch64-linux, -O4) - before:

	...
.Lc332:
	sub	x2,x1,#1
	add	x3,x2,x0
	and	x0,x2,x1
	cbnz	x0,.Lj373
	mvn	x0,x2
	and	x0,x3,x0
	b	.Lj374
.Lj373:
	udiv	x2,x3,x1
	cbnz	x1,.Lj375
	bl	FPC_DIVBYZERO
	...

After, the MVN instruction is removed and the AND changed to BIC:

	...
.Lc332:
	sub	x2,x1,#1
	add	x3,x2,x0
	and	x0,x2,x1
	cbnz	x0,.Lj373
	bic	x0,x3,x2
	b	.Lj374
.Lj373:
	udiv	x2,x3,x1
	cbnz	x1,.Lj375
	bl	FPC_DIVBYZERO
	...

Later in the system unit, an and not operation with byte-sized operands is simplified (in his case, the EOR operation stands in for MVN)- before:

	...
.Lj1154:
	sub	x3,x3,#1
	ldrb	w4,[x1, x3]
	eor	w4,w4,#255
	ldrb	w5,[x0, x3]
	and	w4,w5,w4
	strb	w4,[x2, x3]
	...

After:

	...
.Lj1154:
	sub	x3,x3,#1
	ldrb	w5,[x1, x3]
	ldrb	w4,[x0, x3]
	bic	w4,w4,w5
	strb	w4,[x2, x3]
	...

In the md5 unit (even though it gets its own assembly version in !523), many or not operations appear - before:

	...
	movz	w3,#22117
	movk	w3,#50348,lsl #16
	add	w3,w4,w3
	ror	w3,w3,#9
	add	w3,w2,w3
	mvn	w4,w1
	orr	w4,w3,w4
	eor	w4,w2,w4
	add	w4,w0,w4
	ldr	w0,[sp]
	add	w4,w0,w4
	movz	w0,#8772
	movk	w0,#62505,lsl #16
	...

After (some register allocations are different in places as well):

	...
	movz	w3,#22117
	movk	w3,#50348,lsl #16
	add	w3,w4,w3
	ror	w3,w3,#9
	add	w3,w2,w3
	orn	w4,w3,w1
	eor	w4,w2,w4
	add	w4,w0,w4
	ldr	w0,[sp]
	add	w4,w0,w4
	movz	w0,#8772
	movk	w0,#62505,lsl #16
	...

[AArch64] "x and (not y)" now uses BIC, and similar for "or not" and "xor not"

Summary

System

What is the current bug behavior?

What is the behavior after applying this patch?

Relevant logs and/or screenshots

Merge request reports