Skip to content

New extra optimisation information feature and showcase

Summary

This merge request adds a new feature to the peephole optimizer that allows extra information to be attached to tai objects via a linked list that can be recalled for later use. This will allow the peephole optimizer to leave 'notes' and the like for later use, potentially allowing for much deeper optimisations.

The showcase is adding a reference to the destination label for jumps when the label is not present in the lookup table because it was added as part of an earlier peephole optimisation, using an extra optimisation information object that allows a reference to another tai object.

To accommodate this showcase, some references to "getlabelwithsym" have been replaced with a new method named "GetDestinationLabel", which takes a taicpu object and, if it's a jump to a label, returns the tai_label representing that label. It first simply calls "getlabelwithsym" after acquiring the label symbol, but if it returns nil, it then attempts to fetch the relevant extra information object and return the tai_label object stored there. It does a final check to see if the symbols match (just in case the jump destination was changed but the extra information wasn't updated).

It uses the old "optinfo" field to store the first extra optimisation information object associated with the tai. The objects themselves are stored in a linked list owned by the TAOptObj object in charge of everything, and is released en masse when the object is destroyed.

ADDITIONAL: An additional commit now allows x86_64 to leave the compiler hits on "mov %reg32,%reg32" instructions when it knows that the upper 32 bits of the register being read are zero. This permits deeper optimisations when the full 64 bits of the destination register are read later on, since it can now safely replace that register with the source register, knowing the upper 32 bits won't become undefined. This is only done under -O2 and above.

System

Everything. Feature implemented for cross-platform jump optimisations and for some x86-specific optimisations.

What is the current bug behavior?

N/A

What is the behavior after applying this patch?

Currently, there should be very few, if any, changes to output code (nothing appears in the RTL), but in some rare cases, deeper jump optimisations should be possible.

ADDITIONAL: A large number of new register-related streamlining optimisations are now performed under x86_64.

Relevant logs and/or screenshots

Currently no changes noted in RTL or compiler source in regards to the jump feature. However, for the register streamlining, there are many examples in many units - in the Sysutils unit, before:

        ...
.Lj102:
	addl	$1,%r12d
	shll	$4,%edi
	movl	%r12d,%eax
	movzbl	(%rbx,%rax,1),%ecx
	call	SYSTEM_$$_UPCASE$CHAR$$CHAR
        ...

Because of the "addl" instruction, the compiler knows that %r12 has its upper 32 bits set to zero, so a hint is left at the following MOV instruction that the existing peephole optimizer code can take advantage of - after:

.Lj102:
	addl	$1,%r12d
	shll	$4,%edi
	movzbl	(%rbx,%r12,1),%ecx
	call	SYSTEM_$$_UPCASE$CHAR$$CHAR

Later in SysUtils there's a double whammy - before:

.Lj1203:
	leal	1(%edi),%eax
	addl	$1,%edi
	movq	-2088(%rbp,%rax,8),%rcx
	call	fpc_pchar_length
	addl	%eax,%esi
	cmpl	%ebx,%edi
	jnge	.Lj1203
.Lj1202:
	movslq	%esi,%rdx
	movq	-2128(%rbp),%rcx
	xorl	%r8d,%r8d
	call	fpc_ansistr_setlength
	movl	%r12d,%ebx
	cmpl	%r12d,%r13d
	jnle	.Lj1199
	leal	-1(%r13d),%edi
	.p2align 4,,10
	.p2align 3
.Lj1208:
	leal	1(%edi),%eax
	addl	$1,%edi
	movq	-2088(%rbp,%rax,8),%rcx
	call	fpc_pchar_length

After:

.Lj1203:
	addl	$1,%edi
	movq	-2088(%rbp,%rdi,8),%rcx
	call	fpc_pchar_length
	addl	%eax,%esi
	cmpl	%ebx,%edi
	jnge	.Lj1203
.Lj1202:
	movslq	%esi,%rdx
	movq	-2128(%rbp),%rcx
	xorl	%r8d,%r8d
	call	fpc_ansistr_setlength
	movl	%r12d,%ebx
	cmpl	%r12d,%r13d
	jnle	.Lj1199
	leal	-1(%r13d),%edi
	.p2align 4,,10
	.p2align 3
.Lj1208:
	addl	$1,%edi
	movq	-2088(%rbp,%rdi,8),%rcx
	call	fpc_pchar_length

Because of the register hint, the two "leal 1(%edi),%eax" are not generated (which was performed by the AddMov2LeaAdd optimisation in OptPass2ADD) because the MOV instructions get optimised out.

The compiler itself receives improvements in the generated code in a large number of units under x86_64, and despite the introduction of the feature, x86_64-win64's compiler only grows by 2,048 bytes under -O4.

Feature is planned to be used more extensively later.

Edited by J. Gareth "Kit" Moreton

Merge request reports