x86: Optimisation simplifications (!73) · Merge requests · FPC / FPC / FPC Source

J. Gareth "Kit" Moreton requested to merge CuriousKit/optimisations:x86-opt-simplification into main Oct 10, 2021

Summary

This merge request simplifies a couple of post-peephole optimisations on x86-64 platforms:

The MOVZX optimisation that changes BQ and WQ sizes to BL and WL respectively has been removed, because, while legal, the compiler never generates these sizes.
The xorq->xorl optimisation no longer checks to see if the register is RAX, RCX, RDX, RBX, RSI, RDI, RBP or RSP because Silvermont processors only recognise the 32-bit version as dependency-breaking, not 64-bit, and so it's beneficial for all registers, not just those that permit the removal of the REX prefix.

System

Processor architecture: x86-64 (not i386 because the blocks of code in question are surrounded by "{$ifdef x86_64}" directives.

What is the current bug behavior?

N/A

What is the behavior after applying this patch?

Confirm correct compliation and no test regressions.
Confirm that removal of MOVZX optimisation has zero effect on generated code because the required input sizes are never generated.
Confirm changes of xorq %reg,%reg to xorl %reg,%reg for registers %r8-%r15 (debug comment is slightly different now too, saying "32-bit register recommended when zeroing 64-bit counterpart" rather than "removes REX prefix").

Relevant logs and/or screenshots

Example from Sysutils under -O4 - before:

        ...
.Lj2412:
	movq	$10,-48(%rbp)
	leaq	-48(%rbp),%r9
	movq	-32(%rbp),%rcx
	leaq	RTTI_$SYSUTILS_$$_TSTRINGARRAY(%rip),%rdx
# Peephole Optimization: movq $1,%r8 -> movl $1,%r8d (immediate can be represented with just 32 bits)
	movl	$1,%r8d
	call	fpc_dynarray_setlength
	xorq	%r12,%r12
	xorq	%r13,%r13
	movq	%rbp,%rcx
        ...

After:

        ...
.Lj2412:
	movq	$10,-48(%rbp)
	leaq	-48(%rbp),%r9
	movq	-32(%rbp),%rcx
	leaq	RTTI_$SYSUTILS_$$_TSTRINGARRAY(%rip),%rdx
# Peephole Optimization: movq $1,%r8 -> movl $1,%r8d (immediate can be represented with just 32 bits)
	movl	$1,%r8d
	call	fpc_dynarray_setlength
# Peephole Optimization: xorq %r12,%r12 -> xorl %r12d,%r12d (32-bit register recommended when zeroing 64-bit counterpart)
	xorl	%r12d,%r12d
# Peephole Optimization: xorq %r13,%r13 -> xorl %r13d,%r13d (32-bit register recommended when zeroing 64-bit counterpart)
	xorl	%r13d,%r13d
	movq	%rbp,%rcx
        ...

Edited Oct 10, 2021 by J. Gareth "Kit" Moreton

x86: Optimisation simplifications

Summary

System

What is the current bug behavior?

What is the behavior after applying this patch?

Relevant logs and/or screenshots

Merge request reports