[ARM / AArch64 / Bug Fix] Undefined register fix (webtbs/tw22869) (!606) · Merge requests · FPC / FPC / FPC Source

J. Gareth "Kit" Moreton requested to merge CuriousKit/optimisations:aarch64-tw22869-fix into main Mar 04, 2024

Summary

This merge request fixes an issue where certain combinations of peephole optimisations can cause a situation where both instructions in a mov r0,r1; mov r1,r1 pair get removed, leaving r1 undefined. This specifically caused webtbs/tw22869 to fail.

Specifically, if an instruction such as ubfx x1,x1,#0,#64 is optimised to mov x1,x1, it can sometimes cause an earlier mov x1,x0 to be optimised out before mov x1,x1 is itself removed because the peephole optimizer assumes that mov x1,x0 is a dead-store (which would be true if the next mov x1,... wasn't reading from itself).

System

Operating system: Linux (Raspberry Pi OS) and others
Processor architecture: ARM, AArch64
Device: Raspberry Pi and others

What is the current bug behavior?

Under -O2 and above, in rare situations, a register can become undefined due to certain peephole optimisations.

What is the behavior after applying this patch?

Such registers should no longer be left undefined.

Relevant logs and/or screenshots

In webtbs/tw22869, the peephole optimisations caused r1 to be undefined and the test to fail with an access violation - before:

.section .text.n_p$program_$$_doit$trec,"ax"
	.balign 8
.globl	P$PROGRAM_$$_DOIT$TREC
	.type	P$PROGRAM_$$_DOIT$TREC,@function
P$PROGRAM_$$_DOIT$TREC:
.Lc4:
	stp	x29,x30,[sp, #-16]!
.Lc5:
	mov	x29,sp
.Lc6:
// Peephole Optimization: RedundantMovProcess 2b done
// Peephole Optimization: SBFX or UBFX -> MOV (full bitfield extract)
// Peephole Optimization: Mov2None 2 done
// Peephole Optimization: SBFX or UBFX -> MOV (full bitfield extract)
// Peephole Optimization: Mov2None done
	ldr	x1,[x1] ; <-- x1 is undefined.
	ldr	x1,[x1, #200]
	blr	x1
	ldp	x29,x30,[sp], #16
	ret
.Lc3:
.Le1:
	.size	P$PROGRAM_$$_DOIT$TREC, .Le1 - P$PROGRAM_$$_DOIT$TREC

After:

.section .text.n_p$program_$$_doit$trec,"ax"
	.balign 8
.globl	P$PROGRAM_$$_DOIT$TREC
	.type	P$PROGRAM_$$_DOIT$TREC,@function
P$PROGRAM_$$_DOIT$TREC:
.Lc4:
	stp	x29,x30,[sp, #-16]!
.Lc5:
	mov	x29,sp
.Lc6:
// Peephole Optimization: RedundantMovProcess 2a done
// Peephole Optimization: SBFX or UBFX -> MOV (full bitfield extract)
// Peephole Optimization: Mov2None 2 done
// Peephole Optimization: SBFX or UBFX -> MOV (full bitfield extract)
// Peephole Optimization: Mov2None 2a done
// Peephole Optimization: x1 = x0 (MovLdr2Ldr 1)
	ldr	x1,[x0] ; // <-- x1 is now loading from x0, which is defined as it's the procedure's formal parameter.
	ldr	x1,[x1, #200]
	blr	x1
	ldp	x29,x30,[sp], #16
	ret
.Lc3:
.Le1:
	.size	P$PROGRAM_$$_DOIT$TREC, .Le1 - P$PROGRAM_$$_DOIT$TREC

For clarity, this is what the routine looks like with the peephole optimizer turned off:

.section .text.n_p$program$_$tc_$__$$_test,"ax"
	.balign 8
.globl	P$PROGRAM$_$TC_$__$$_TEST
	.type	P$PROGRAM$_$TC_$__$$_TEST,@function
P$PROGRAM$_$TC_$__$$_TEST:
.Lc2:
	ret
.Lc1:
.Le0:
	.size	P$PROGRAM$_$TC_$__$$_TEST, .Le0 - P$PROGRAM$_$TC_$__$$_TEST

.section .text.n_p$program_$$_doit$trec,"ax"
	.balign 8
.globl	P$PROGRAM_$$_DOIT$TREC
	.type	P$PROGRAM_$$_DOIT$TREC,@function
P$PROGRAM_$$_DOIT$TREC:
.Lc4:
	stp	x29,x30,[sp, #-16]!
.Lc5:
	mov	x29,sp
.Lc6:
	mov	x1,x0
	ubfx	x0,x1,#0,#64
	ubfx	x1,x1,#0,#64
	ldr	x1,[x1] ; // <-- x1 = x0, hence it is defined.
	ldr	x1,[x1, #200]
	blr	x1
	ldp	x29,x30,[sp], #16
	ret
.Lc3:
.Le1:
	.size	P$PROGRAM_$$_DOIT$TREC, .Le1 - P$PROGRAM_$$_DOIT$TREC

Edited Mar 04, 2024 by J. Gareth "Kit" Moreton

[ARM / AArch64 / Bug Fix] Undefined register fix (webtbs/tw22869)

Summary

System

What is the current bug behavior?

What is the behavior after applying this patch?

Relevant logs and/or screenshots

Merge request reports