Skip to content

[Patch] ARM/AArch64 Some short-range LDR/STR optimisations

Original Reporter info from Mantis: CuriousKit @CuriousKit
  • Reporter name: J. Gareth Moreton

Description:

The "ldrstr.patch" file provides some short-term optimisations for LDR and STR instructions that removes unnecessary instructions (e.g. storing a register to memory, then loading from the same address to the same register). These optimisations are performed over all ARM platforms, although it fixes a minor bug in the RedundantMovProcess routine for AArch64 (this optimisation often occurs after the new "load/load -> load/move" optimisation is made).

The "peephole-string.patch" seeks to homogenise the optimisation comments that appear when DEBUG_AOPTCPU is declared, prepending all such messages with the SPeepholeOptimization string constant, much like the x86 implementations.

Steps to reproduce:

Apply patch and confirm correct compilation on all ARM and AArch64 platforms.

Additional information:

The two patches share a hunk (the declaration of SPeepholeOptimization) and a single rejection will occur when they are applied together. This won't cause a bad merge.

I confess that this patch hasn't been fully tested on Arm-32 platforms due to technical reasons - third-party testing would be required.

Some examples of the optimisations under aarch64-linux:

In the Sysutils unit - before:

	strh	w2,[x0]
	ldrh	w0,[x0]
	ldp	x29,x30,[sp], 16
	ret
.Le429:

After:

	strh	w2,[x0]
	uxth	w0,w2 <-- ldr changed to a uxth instruction based on the postfixes of the str and ldr instructions (minimises read-after-write penalty).
	ldp	x29,x30,[sp], 16
	ret
.Le429:

----

Also in Sysutils - before:

	str	x0,[sp, 24]
	ldr	x0,[sp, 24]
	str	x0,[sp, 32]
.Lj3450:

After:

stp	x0,x0,[sp, 24] <-- the ldr instruction is removed because x0 already contains the value at the address specified (because it was just written there), and then the two str instructions are merged into an stp instruction later on.<br/>

.Lj3450:

----

In the Classes unit - before:

	b.ne	.Lj947
	ldr	x0,[sp, 16]
	ldr	x1,[sp, 16]
	ldr	x1,[x1, 104]
	blr	x1
	str	x0,[sp, #16]
.Lj947:

After:

	b.ne	.Lj947
	ldr	x0,[sp, 16]
	ldr	x1,[x0, 104] <-- Second ldr was changed to "mov x1,x0", which was then optimised by RedundantMovProcess and merged into the 3rd ldr.
	blr	x1
	str	x0,[sp, 16]
.Lj947:

----

Longer-range optimisations of this kind are still being researched because of the fact that references are involved - watch this space!

Mantis conversion info:

  • Mantis ID: 38841
  • OS: Debian GNU/LInux (Raspberry Pi)
  • OS Build: 10
  • Build: r49298
  • Platform: arm and aarch64
  • Version: 3.3.1
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information