[ARM / AArch64 / Bug Fix] Undefined register fix (webtbs/tw22869)
Summary
This merge request fixes an issue where certain combinations of peephole optimisations can cause a situation where both instructions in a mov r0,r1; mov r1,r1
pair get removed, leaving r1
undefined. This specifically caused webtbs/tw22869
to fail.
Specifically, if an instruction such as ubfx x1,x1,#0,#64
is optimised to mov x1,x1
, it can sometimes cause an earlier mov x1,x0
to be optimised out before mov x1,x1
is itself removed because the peephole optimizer assumes that mov x1,x0
is a dead-store (which would be true if the next mov x1,...
wasn't reading from itself).
System
- Operating system: Linux (Raspberry Pi OS) and others
- Processor architecture: ARM, AArch64
- Device: Raspberry Pi and others
What is the current bug behavior?
Under -O2 and above, in rare situations, a register can become undefined due to certain peephole optimisations.
What is the behavior after applying this patch?
Such registers should no longer be left undefined.
Relevant logs and/or screenshots
In webtbs/tw22869
, the peephole optimisations caused r1
to be undefined and the test to fail with an access violation - before:
.section .text.n_p$program_$$_doit$trec,"ax"
.balign 8
.globl P$PROGRAM_$$_DOIT$TREC
.type P$PROGRAM_$$_DOIT$TREC,@function
P$PROGRAM_$$_DOIT$TREC:
.Lc4:
stp x29,x30,[sp, #-16]!
.Lc5:
mov x29,sp
.Lc6:
// Peephole Optimization: RedundantMovProcess 2b done
// Peephole Optimization: SBFX or UBFX -> MOV (full bitfield extract)
// Peephole Optimization: Mov2None 2 done
// Peephole Optimization: SBFX or UBFX -> MOV (full bitfield extract)
// Peephole Optimization: Mov2None done
ldr x1,[x1] ; <-- x1 is undefined.
ldr x1,[x1, #200]
blr x1
ldp x29,x30,[sp], #16
ret
.Lc3:
.Le1:
.size P$PROGRAM_$$_DOIT$TREC, .Le1 - P$PROGRAM_$$_DOIT$TREC
After:
.section .text.n_p$program_$$_doit$trec,"ax"
.balign 8
.globl P$PROGRAM_$$_DOIT$TREC
.type P$PROGRAM_$$_DOIT$TREC,@function
P$PROGRAM_$$_DOIT$TREC:
.Lc4:
stp x29,x30,[sp, #-16]!
.Lc5:
mov x29,sp
.Lc6:
// Peephole Optimization: RedundantMovProcess 2a done
// Peephole Optimization: SBFX or UBFX -> MOV (full bitfield extract)
// Peephole Optimization: Mov2None 2 done
// Peephole Optimization: SBFX or UBFX -> MOV (full bitfield extract)
// Peephole Optimization: Mov2None 2a done
// Peephole Optimization: x1 = x0 (MovLdr2Ldr 1)
ldr x1,[x0] ; // <-- x1 is now loading from x0, which is defined as it's the procedure's formal parameter.
ldr x1,[x1, #200]
blr x1
ldp x29,x30,[sp], #16
ret
.Lc3:
.Le1:
.size P$PROGRAM_$$_DOIT$TREC, .Le1 - P$PROGRAM_$$_DOIT$TREC
For clarity, this is what the routine looks like with the peephole optimizer turned off:
.section .text.n_p$program$_$tc_$__$$_test,"ax"
.balign 8
.globl P$PROGRAM$_$TC_$__$$_TEST
.type P$PROGRAM$_$TC_$__$$_TEST,@function
P$PROGRAM$_$TC_$__$$_TEST:
.Lc2:
ret
.Lc1:
.Le0:
.size P$PROGRAM$_$TC_$__$$_TEST, .Le0 - P$PROGRAM$_$TC_$__$$_TEST
.section .text.n_p$program_$$_doit$trec,"ax"
.balign 8
.globl P$PROGRAM_$$_DOIT$TREC
.type P$PROGRAM_$$_DOIT$TREC,@function
P$PROGRAM_$$_DOIT$TREC:
.Lc4:
stp x29,x30,[sp, #-16]!
.Lc5:
mov x29,sp
.Lc6:
mov x1,x0
ubfx x0,x1,#0,#64
ubfx x1,x1,#0,#64
ldr x1,[x1] ; // <-- x1 = x0, hence it is defined.
ldr x1,[x1, #200]
blr x1
ldp x29,x30,[sp], #16
ret
.Lc3:
.Le1:
.size P$PROGRAM_$$_DOIT$TREC, .Le1 - P$PROGRAM_$$_DOIT$TREC