Skip to content

Compiler auto-inline suggestions

Summary

This merge request was developed after studying the output of the experimental auto-inline feature on the compiler's compilation and cherry-picking good choices. This will provide performance boosts for the compiler while auto-inline is still in development.

System

  • Operating system: All
  • Processor architecture: All, although x86 has received some specific to that platform

What is the current bug behavior?

N/A

What is the behavior after applying this patch?

A number of internal compiler procedures have been made inline.

Additional Notes

  • Most of the inlined routines are those which only have a single line of code and are relatively simple.
  • Routines with a single Boolean condition were also optimised if the result was also Boolean, since more often than not, the function call was part of a condition and hence the subroutine's condition marges into it, or simply sets a register based on flags etc.
  • There were cases where a Boolean function had a deterministic result, either because the input was deterministic or because the result was always set to the same value. This resulted in large blocks of assembly language getting optimised out.
  • Auto-inlining occasionally makes some very bad choices, hence why it isn't ready for general use. For example, it inlines the TFPList.Add method and produces this horrible mess in packages/fcl-res/src/acceleratorsresource.pp:
	movq	40(%rsp),%rdx
	movq	32(%rsp),%rax
	movq	%rax,(%rdx)
	movq	40(%rsp),%r14
	movq	80(%rbx),%r12
	movl	16(%r12),%eax
	cmpl	20(%r12),%eax
	jne	.Lj16
	movl	16(%r12),%eax
	cmpl	20(%r12),%eax
	jnl	.Lj19
	movq	%r12,%r13
	jmp	.Lj16
	.p2align 4,,10
	.p2align 3
.Lj19:
	cmpl	$134217728,20(%r12)
	jng	.Lj21
	movl	$16777216,%edx
	jmp	.Lj22
	.p2align 4,,10
	.p2align 3
.Lj21:
	cmpl	$8388608,20(%r12)
	jng	.Lj24
	movl	20(%r12),%edx
	shrl	$3,%edx
	jmp	.Lj22
	.p2align 4,,10
	.p2align 3
.Lj24:
	cmpl	$128,20(%r12)
	jng	.Lj27
	movl	20(%r12),%edx
	shrl	$2,%edx
	jmp	.Lj22
	.p2align 4,,10
	.p2align 3
.Lj27:
	cmpl	$8,20(%r12)
	jng	.Lj30
	movl	$16,%edx
	jmp	.Lj22
	.p2align 4,,10
	.p2align 3
.Lj30:
	movl	$4,%edx
.Lj22:
	addl	20(%r12),%edx
	movq	%r12,%rcx
	call	CLASSES$_$TFPLIST_$__$$_SETCAPACITY$LONGINT
	movq	%r12,%r13
.Lj16:
	movq	8(%r12),%rax
	movl	16(%r12),%edx
	movq	%r14,(%rax,%rdx,8)
	movl	16(%r12),%eax
	addl	$1,16(%r12)
	cmpl	%esi,%edi
	jnge	.Lj11
.Lj3:

When it isn't inlined:

	movq	40(%rsp),%rdx
	movq	32(%rsp),%rax
	movq	%rax,(%rdx)
	movq	80(%rbx),%rcx
	movq	40(%rsp),%rdx
	call	CLASSES$_$TFPLIST_$__$$_ADD$POINTER$$LONGINT
	cmpl	%esi,%edi
	jnge	.Lj11
.Lj3:

Need I say more? Ignoring the massive code bloat, it introduces 7 new labels!

Merge request reports