Compiler auto-inline suggestions (!197) · Merge requests · FPC / FPC / FPC Source

J. Gareth "Kit" Moreton requested to merge CuriousKit/optimisations:compiler-auto-inline-suggestions into main Apr 14, 2022

Summary

This merge request was developed after studying the output of the experimental auto-inline feature on the compiler's compilation and cherry-picking good choices. This will provide performance boosts for the compiler while auto-inline is still in development.

System

Operating system: All
Processor architecture: All, although x86 has received some specific to that platform

What is the current bug behavior?

N/A

What is the behavior after applying this patch?

A number of internal compiler procedures have been made inline.

Additional Notes

Most of the inlined routines are those which only have a single line of code and are relatively simple.
Routines with a single Boolean condition were also optimised if the result was also Boolean, since more often than not, the function call was part of a condition and hence the subroutine's condition marges into it, or simply sets a register based on flags etc.
There were cases where a Boolean function had a deterministic result, either because the input was deterministic or because the result was always set to the same value. This resulted in large blocks of assembly language getting optimised out.
Auto-inlining occasionally makes some very bad choices, hence why it isn't ready for general use. For example, it inlines the TFPList.Add method and produces this horrible mess in packages/fcl-res/src/acceleratorsresource.pp:

	movq	40(%rsp),%rdx
	movq	32(%rsp),%rax
	movq	%rax,(%rdx)
	movq	40(%rsp),%r14
	movq	80(%rbx),%r12
	movl	16(%r12),%eax
	cmpl	20(%r12),%eax
	jne	.Lj16
	movl	16(%r12),%eax
	cmpl	20(%r12),%eax
	jnl	.Lj19
	movq	%r12,%r13
	jmp	.Lj16
	.p2align 4,,10
	.p2align 3
.Lj19:
	cmpl	$134217728,20(%r12)
	jng	.Lj21
	movl	$16777216,%edx
	jmp	.Lj22
	.p2align 4,,10
	.p2align 3
.Lj21:
	cmpl	$8388608,20(%r12)
	jng	.Lj24
	movl	20(%r12),%edx
	shrl	$3,%edx
	jmp	.Lj22
	.p2align 4,,10
	.p2align 3
.Lj24:
	cmpl	$128,20(%r12)
	jng	.Lj27
	movl	20(%r12),%edx
	shrl	$2,%edx
	jmp	.Lj22
	.p2align 4,,10
	.p2align 3
.Lj27:
	cmpl	$8,20(%r12)
	jng	.Lj30
	movl	$16,%edx
	jmp	.Lj22
	.p2align 4,,10
	.p2align 3
.Lj30:
	movl	$4,%edx
.Lj22:
	addl	20(%r12),%edx
	movq	%r12,%rcx
	call	CLASSES$_$TFPLIST_$__$$_SETCAPACITY$LONGINT
	movq	%r12,%r13
.Lj16:
	movq	8(%r12),%rax
	movl	16(%r12),%edx
	movq	%r14,(%rax,%rdx,8)
	movl	16(%r12),%eax
	addl	$1,16(%r12)
	cmpl	%esi,%edi
	jnge	.Lj11
.Lj3:

When it isn't inlined:

	movq	40(%rsp),%rdx
	movq	32(%rsp),%rax
	movq	%rax,(%rdx)
	movq	80(%rbx),%rcx
	movq	40(%rsp),%rdx
	call	CLASSES$_$TFPLIST_$__$$_ADD$POINTER$$LONGINT
	cmpl	%esi,%edi
	jnge	.Lj11
.Lj3:

Need I say more? Ignoring the massive code bloat, it introduces 7 new labels!

Compiler auto-inline suggestions

Summary

System

What is the current bug behavior?

What is the behavior after applying this patch?

Additional Notes

Merge request reports