[Cross-platform] Optimisation of "Str" intrinsic with constant actual parameter (and supporting functionality)
Summary
This merge request adds some new features to TCallNode to better support the optimisation of internal compiler procedures:
- A new
intrinsiccode
field helps track which call nodes were created from language intrinsics. - Parameter nodes can be fetched based on their original order (the first pass tends to reorder them to fit into registers and on the stack better).
- To showcase the above features, the
Str
intrinsic, when converting an integer into a string, is now optimised into a simple assignment if the input is an integer constant. For example, the nodes forStr(5, Output);
become the equivalent ofOutput := '5';
.
System
- Processor architecture: All
Additional notes
- Generally,
Str
and its wrapper functions (e.g.IntToStr
) aren't called with actual parameters that are constants, but there may be situations through constant propagation and function inlining where such constants appear and can hence be optimised. - This optimisation and its framework has been developed to benefit pure functions (currently in development), the idea being that
IntToStr
, among other things, will become a pure function.
Relevant logs and/or screenshots
Outside of the source files that were changed, only one example in the compiler and RTL sources appears (under x86_64-win64, -O4) - before:
.section .text.n_ppu$_$tppufile_$__$$_newheader,"ax"
.balign 16,0x90
.globl PPU$_$TPPUFILE_$__$$_NEWHEADER
PPU$_$TPPUFILE_$__$$_NEWHEADER:
.Lc25:
.seh_proc PPU$_$TPPUFILE_$__$$_NEWHEADER
...
leaq 32(%rsp),%r8
# Peephole Optimization: movq $255,%r9 -> movl $255,%r9d (immediate can be represented with just 32 bits)
movl $255,%r9d
movq $-1,%rdx
# Peephole Optimization: movq $208,%rcx -> movl $208,%ecx (immediate can be represented with just 32 bits)
movl $208,%ecx
call fpc_shortstr_sint
jmp .Lj30
...
After:
.section .text.n_ppu$_$tppufile_$__$$_newheader,"ax"
.balign 16,0x90
.globl PPU$_$TPPUFILE_$__$$_NEWHEADER
PPU$_$TPPUFILE_$__$$_NEWHEADER:
.Lc25:
.seh_proc PPU$_$TPPUFILE_$__$$_NEWHEADER
...
leaq _$PPU$_Ld1(%rip),%r8
leaq 32(%rsp),%rcx
# Peephole Optimization: movq $255,%rdx -> movl $255,%edx (immediate can be represented with just 32 bits)
movl $255,%edx
call fpc_shortstr_to_shortstr
jmp .Lj30
...
These lines of assembly language correspond to the statement str(currentppuversion,s);
, where currentppuversion
is a constant currently equal to 208. The optimised assembly language refers to the symbol _$PPU$_Ld1
, which is the ASCII text \003208\000
: the number 208 as a string prefixed with a length (3) and suffixed with a null terminator. As a result, since the line of code essentially changes to s := '208';
internally, the rest of the procedure could easily benefit from constant propagation at a later date.
NOTE: fpc_shortstr_sint
is a relatively complex procedure, at least compared to fpc_shortstr_to_shortstr
which is a simple memory transfer, so this is a time saving either way.