Skip to content

x86: Optimisation simplifications

Summary

This merge request simplifies a couple of post-peephole optimisations on x86-64 platforms:

  • The MOVZX optimisation that changes BQ and WQ sizes to BL and WL respectively has been removed, because, while legal, the compiler never generates these sizes.
  • The xorq->xorl optimisation no longer checks to see if the register is RAX, RCX, RDX, RBX, RSI, RDI, RBP or RSP because Silvermont processors only recognise the 32-bit version as dependency-breaking, not 64-bit, and so it's beneficial for all registers, not just those that permit the removal of the REX prefix.

System

  • Processor architecture: x86-64 (not i386 because the blocks of code in question are surrounded by "{$ifdef x86_64}" directives.

What is the current bug behavior?

N/A

What is the behavior after applying this patch?

  • Confirm correct compliation and no test regressions.
  • Confirm that removal of MOVZX optimisation has zero effect on generated code because the required input sizes are never generated.
  • Confirm changes of xorq %reg,%reg to xorl %reg,%reg for registers %r8-%r15 (debug comment is slightly different now too, saying "32-bit register recommended when zeroing 64-bit counterpart" rather than "removes REX prefix").

Relevant logs and/or screenshots

Example from Sysutils under -O4 - before:

        ...
.Lj2412:
	movq	$10,-48(%rbp)
	leaq	-48(%rbp),%r9
	movq	-32(%rbp),%rcx
	leaq	RTTI_$SYSUTILS_$$_TSTRINGARRAY(%rip),%rdx
# Peephole Optimization: movq $1,%r8 -> movl $1,%r8d (immediate can be represented with just 32 bits)
	movl	$1,%r8d
	call	fpc_dynarray_setlength
	xorq	%r12,%r12
	xorq	%r13,%r13
	movq	%rbp,%rcx
        ...

After:

        ...
.Lj2412:
	movq	$10,-48(%rbp)
	leaq	-48(%rbp),%r9
	movq	-32(%rbp),%rcx
	leaq	RTTI_$SYSUTILS_$$_TSTRINGARRAY(%rip),%rdx
# Peephole Optimization: movq $1,%r8 -> movl $1,%r8d (immediate can be represented with just 32 bits)
	movl	$1,%r8d
	call	fpc_dynarray_setlength
# Peephole Optimization: xorq %r12,%r12 -> xorl %r12d,%r12d (32-bit register recommended when zeroing 64-bit counterpart)
	xorl	%r12d,%r12d
# Peephole Optimization: xorq %r13,%r13 -> xorl %r13d,%r13d (32-bit register recommended when zeroing 64-bit counterpart)
	xorl	%r13d,%r13d
	movq	%rbp,%rcx
        ...
Edited by J. Gareth "Kit" Moreton

Merge request reports