Compiler auto-inline suggestions
Summary
This merge request was developed after studying the output of the experimental auto-inline feature on the compiler's compilation and cherry-picking good choices. This will provide performance boosts for the compiler while auto-inline is still in development.
System
- Operating system: All
- Processor architecture: All, although x86 has received some specific to that platform
What is the current bug behavior?
N/A
What is the behavior after applying this patch?
A number of internal compiler procedures have been made inline.
Additional Notes
- Most of the inlined routines are those which only have a single line of code and are relatively simple.
- Routines with a single Boolean condition were also optimised if the result was also Boolean, since more often than not, the function call was part of a condition and hence the subroutine's condition marges into it, or simply sets a register based on flags etc.
- There were cases where a Boolean function had a deterministic result, either because the input was deterministic or because the result was always set to the same value. This resulted in large blocks of assembly language getting optimised out.
- Auto-inlining occasionally makes some very bad choices, hence why it isn't ready for general use. For example, it inlines the TFPList.Add method and produces this horrible mess in packages/fcl-res/src/acceleratorsresource.pp:
movq 40(%rsp),%rdx
movq 32(%rsp),%rax
movq %rax,(%rdx)
movq 40(%rsp),%r14
movq 80(%rbx),%r12
movl 16(%r12),%eax
cmpl 20(%r12),%eax
jne .Lj16
movl 16(%r12),%eax
cmpl 20(%r12),%eax
jnl .Lj19
movq %r12,%r13
jmp .Lj16
.p2align 4,,10
.p2align 3
.Lj19:
cmpl $134217728,20(%r12)
jng .Lj21
movl $16777216,%edx
jmp .Lj22
.p2align 4,,10
.p2align 3
.Lj21:
cmpl $8388608,20(%r12)
jng .Lj24
movl 20(%r12),%edx
shrl $3,%edx
jmp .Lj22
.p2align 4,,10
.p2align 3
.Lj24:
cmpl $128,20(%r12)
jng .Lj27
movl 20(%r12),%edx
shrl $2,%edx
jmp .Lj22
.p2align 4,,10
.p2align 3
.Lj27:
cmpl $8,20(%r12)
jng .Lj30
movl $16,%edx
jmp .Lj22
.p2align 4,,10
.p2align 3
.Lj30:
movl $4,%edx
.Lj22:
addl 20(%r12),%edx
movq %r12,%rcx
call CLASSES$_$TFPLIST_$__$$_SETCAPACITY$LONGINT
movq %r12,%r13
.Lj16:
movq 8(%r12),%rax
movl 16(%r12),%edx
movq %r14,(%rax,%rdx,8)
movl 16(%r12),%eax
addl $1,16(%r12)
cmpl %esi,%edi
jnge .Lj11
.Lj3:
When it isn't inlined:
movq 40(%rsp),%rdx
movq 32(%rsp),%rax
movq %rax,(%rdx)
movq 80(%rbx),%rcx
movq 40(%rsp),%rdx
call CLASSES$_$TFPLIST_$__$$_ADD$POINTER$$LONGINT
cmpl %esi,%edi
jnge .Lj11
.Lj3:
Need I say more? Ignoring the massive code bloat, it introduces 7 new labels!