Skip to content

[Cross-platform] Optimisation of "Str" intrinsic with constant actual parameter (and supporting functionality)

Summary

This merge request adds some new features to TCallNode to better support the optimisation of internal compiler procedures:

  • A new intrinsiccode field helps track which call nodes were created from language intrinsics.
  • Parameter nodes can be fetched based on their original order (the first pass tends to reorder them to fit into registers and on the stack better).
  • To showcase the above features, the Str intrinsic, when converting an integer into a string, is now optimised into a simple assignment if the input is an integer constant. For example, the nodes for Str(5, Output); become the equivalent of Output := '5';.

System

  • Processor architecture: All

Additional notes

  • Generally, Str and its wrapper functions (e.g. IntToStr) aren't called with actual parameters that are constants, but there may be situations through constant propagation and function inlining where such constants appear and can hence be optimised.
  • This optimisation and its framework has been developed to benefit pure functions (currently in development), the idea being that IntToStr, among other things, will become a pure function.

Relevant logs and/or screenshots

Outside of the source files that were changed, only one example in the compiler and RTL sources appears (under x86_64-win64, -O4) - before:

.section .text.n_ppu$_$tppufile_$__$$_newheader,"ax"
	.balign 16,0x90
.globl	PPU$_$TPPUFILE_$__$$_NEWHEADER
PPU$_$TPPUFILE_$__$$_NEWHEADER:
.Lc25:
.seh_proc PPU$_$TPPUFILE_$__$$_NEWHEADER
	...
	leaq	32(%rsp),%r8
# Peephole Optimization: movq $255,%r9 -> movl $255,%r9d (immediate can be represented with just 32 bits)
	movl	$255,%r9d
	movq	$-1,%rdx
# Peephole Optimization: movq $208,%rcx -> movl $208,%ecx (immediate can be represented with just 32 bits)
	movl	$208,%ecx
	call	fpc_shortstr_sint
	jmp	.Lj30
	...

After:

.section .text.n_ppu$_$tppufile_$__$$_newheader,"ax"
	.balign 16,0x90
.globl	PPU$_$TPPUFILE_$__$$_NEWHEADER
PPU$_$TPPUFILE_$__$$_NEWHEADER:
.Lc25:
.seh_proc PPU$_$TPPUFILE_$__$$_NEWHEADER
	...
	leaq	_$PPU$_Ld1(%rip),%r8
	leaq	32(%rsp),%rcx
# Peephole Optimization: movq $255,%rdx -> movl $255,%edx (immediate can be represented with just 32 bits)
	movl	$255,%edx
	call	fpc_shortstr_to_shortstr
	jmp	.Lj30
	...

These lines of assembly language correspond to the statement str(currentppuversion,s);, where currentppuversion is a constant currently equal to 208. The optimised assembly language refers to the symbol _$PPU$_Ld1, which is the ASCII text \003208\000: the number 208 as a string prefixed with a length (3) and suffixed with a null terminator. As a result, since the line of code essentially changes to s := '208'; internally, the rest of the procedure could easily benefit from constant propagation at a later date.

NOTE: fpc_shortstr_sint is a relatively complex procedure, at least compared to fpc_shortstr_to_shortstr which is a simple memory transfer, so this is a time saving either way.

Edited by J. Gareth "Kit" Moreton

Merge request reports