[Refactor] Speed optimisations for peephole register tracking functions (!384) · Merge requests · FPC / FPC / FPC Source

J. Gareth "Kit" Moreton requested to merge CuriousKit/optimisations:updateusedregs-speedup into main Feb 26, 2023

Summary

This merge request updates the register tracking functions such as UpdateUsedRegs to be marginally faster, most importantly to minimise the number of passes through the tai_regalloc entries.

RegInUsedRegs, IncludeRegInUsedRegs and ExcludeRegFromUsedRegs are now inlined functions, given they are relatively simple property accesses that compile into relatively few instructions (especially when compared to parameter configuration).
RegInUsedRegs now uses a var parameter for the register tracking array, mainly to prevent unwanted array copying when inlined.
The UpdateUsedRegs method where you specify which register tracking array to update has been rewritten to essentially be a copy of the regular UpdateUsedRegs, since the old version passed through the same group of tai_regalloc entries for each type of register. The patch will now only pass through the group once.
UpdateUsedRegsBetween has been rewritten to be like UpdateUsedRegs in that it only passes through the entries once.
While the regular UpdateUsedRegs could be inlined and made to just call UpdateUsedRegs(UsedRegs, p), this was decided against in order to reduce register pressure.

System

Processor architecture: All (any that use a peephole optimizer)

What is the current bug behavior?

N/A

What is the behavior after applying this patch?

Compilation should now be slightly faster

Relevant logs and/or screenshots

This is a pure refactor - other than some changes to the compiler binaries, there should be no additions.

The timings are hard to measure accurately under Windows, since "make" is less efficient on that platform, but using PowerShell's "Measure-Command" instruction for "make clean all" under x86_64-win64:

Timings - O2:

Trunk     Patch
----      -----
3:53.438  3:40.228
3:57.470  3:42.494
3:53.929  3:44.954

Timings - 03:

Trunk     Patch
----      -----
3:42.299  3:47.821
3:49.973  3:50.876
3:50.667  3:54.320

On my x86_64-linux virtual machine though, the improvements are much more decisive (timing tool only had centisecond-precision)