Skip to content

[Cross-platform] Pure Functions!

J. Gareth "Kit" Moreton requested to merge CuriousKit/optimisations:pure into main

Summary

After 2 years of development, the first alpha version of pure function support for Free Pascal is here!

Pure functions are subroutines that have no side-effects. More simply, they don't depend on or modify anything outside of their scope, such as global variables or external files. If a pure function is given a specific input, it is guaranteed to return the same output every time. As such, many optimisations can be made based on this assumption, including calculating their deterministic result at compile time.

To define a function as pure, specify the new pure directive in the interface section, much like you would do with inline, say. For example,

function MyPureFunction(Input: Integer): Integer; pure;

The closest equivalent in C++ is constexpr.

I consider this feature to be "alpha" or "pre-alpha", but I have successfully marked a number of internal compiler routines as pure, as well as IntToStr and a couple of other functions in SysUtils.

Pure function calls whose actual parameters are constants are evaluated in full and warnings will be thrown if the compiler ends up with a non-deterministic result during analysis.

I would love for others to thoroughly test this feature in as many creative and diabolical ways as you can come up with!

System

  • Processor architecture: Cross-platform

What is the current bug behavior?

N/A

What is the behavior after applying this patch?

Functions can be defined as pure if they have no side-effects to open up new optimisations.

Additional Notes / Current Shortcomings

  • FPC has a new pre-defined symbol named FPC_HAS_PURE that can be used to safely add pure; directives to existing code (e.g. via {$ifdef FPC_HAS_PURE}pure;{$endif}).
  • There is some crossover between pure functions and inline functions, but there are different use-cases. For example, MD5 and other cryptographic hash functions are pure (so you could use them to derive the hash of a known string constant), but are definitely not something you would want to inline.
  • Pure functions are relatively slow to compile, which is why you have to explicitly state if a subroutine is pure or not, since the compiler has to explicitly make sure the subroutine is actually eligible to be pure.
  • The proposed feature of being able to use pure functions to define constants (e.g. const sin1 = Sin(1);) is not yet available as this requires a different form of analysis. This will come at a later date.
  • There are some safeguards to catch maliciously-written functions (e.g. those with infinite loops), but haven't been fully tested yet.
  • Because pure functions have no side-effects, the compiler routine might_have_sideeffects returns False on a call node that calls a pure function.
  • If writable typed constants are enabled (which seems to be the default option), functions that use these constants are not eligible to be pure, since the constants (which are considered global even if defined within the scope of the subroutine) can change value. This is why the cutils function reverse_byte is not pure.
  • There are some inefficiencies with label definitions if a pure function is called normally (usually because its actual parameters are variables and hence non-deterministic). This is still being resolved, but doesn't produce bad code. (EDIT: This has been largely resolved... see comments below)
  • There are still some problems with recursive pure functions with out parameters because if said parameter is passed into another pure function, it gets marked as having its address taken, which makes it ineligible or otherwise difficult to optimise.

Relevant logs and/or screenshots

Many of the current savings are actually due to commits that aim to support pure functions - for example, in aasmcnst (-O4, x86_64-win64) - before:

.section .text.n_aasmcnst$_$ttai_typedconstbuilder_$__$$_queue_vecn$tdef$tconstexprint,"ax"
	...
	movq	$0,-32(%rbp)
	movq	$0,-24(%rbp)
	leaq	-32(%rbp),%rax
	movq	%rax,32(%rsp)
	leaq	_$AASMCNST$_Ld20(%rip),%r8
	leaq	-296(%rbp),%rcx
	movl	$255,%edx
	call	fpc_shortstr_to_shortstr
	leaq	-296(%rbp),%rdx
	leaq	-40(%rbp),%rcx
	xorl	%r8d,%r8d
	call	fpc_shortstr_to_ansistr
	...

Thanks to optimisations on type conversion nodes, the fpc_shortstr_to_shortstr call gets removed - after:

.section .text.n_aasmcnst$_$ttai_typedconstbuilder_$__$$_queue_vecn$tdef$tconstexprint,"ax"
	...
	movq	$0,-32(%rbp)
	movq	$0,-24(%rbp)
	leaq	-32(%rbp),%rax
	movq	%rax,32(%rsp)
	leaq	_$AASMCNST$_Ld20(%rip),%rdx
	leaq	-40(%rbp),%rcx
	xorl	%r8d,%r8d
	call	fpc_shortstr_to_ansistr
	...

It is hard to gauge many of the improvements currently due to the inefficiency with label numbering that was introduced, but some routines that call pure functions seem to gain some small efficiencies - for example, in the fpmasks unit - before:

.section .text.n_fpmasks$_$tmask_$_create$utf8string$$tmask_$$_addcharset,"ax"
	...
.Lc35:
	pushq	%r13
.seh_pushreg %r13
.Lc36:
	leaq	-80(%rsp),%rsp
.Lc37:
.seh_stackalloc 80
.seh_endprologue
	...
.Lj82:
	movb	%sil,%r13b
	movq	-8(%rbx),%rdx
	movslq	-28(%rbx),%rax
	movzbl	-1(%rdx,%rax,1),%ecx
	call	SYSTEM_$$_UPCASE$ANSICHAR$$ANSICHAR
	cmpb	%r13b,%al
	jnae	.Lj84
	movb	%r13b,%r12b
	subb	$1,%r12b
	...
	leaq	80(%rsp),%rsp
	popq	%r13
.Lc38:
	...

Ignoring the different label numbers - after:

.section .text.n_fpmasks$_$tmask_$_create$utf8string$$tmask_$$_addcharset,"ax"
	...
.Lc35:
	leaq	-72(%rsp),%rsp
.Lc36:
.seh_stackalloc 72
.seh_endprologue
	...
.Lj94:
	movq	-8(%rbx),%rdx
	movslq	-28(%rbx),%rax
	movzbl	-1(%rdx,%rax,1),%ecx
	call	SYSTEM_$$_UPCASE$ANSICHAR$$ANSICHAR
	cmpb	%al,%sil
	jnbe	.Lj96
	movb	%sil,%r12b
	subb	$1,%r12b
	...
	leaq	72(%rsp),%rsp
	...
Edited by J. Gareth "Kit" Moreton

Merge request reports