[Refactor] Speed optimisations for peephole register tracking functions
Summary
This merge request updates the register tracking functions such as UpdateUsedRegs
to be marginally faster, most importantly to minimise the number of passes through the tai_regalloc
entries.
-
RegInUsedRegs
,IncludeRegInUsedRegs
andExcludeRegFromUsedRegs
are now inlined functions, given they are relatively simple property accesses that compile into relatively few instructions (especially when compared to parameter configuration). -
RegInUsedRegs
now uses avar
parameter for the register tracking array, mainly to prevent unwanted array copying when inlined. - The
UpdateUsedRegs
method where you specify which register tracking array to update has been rewritten to essentially be a copy of the regularUpdateUsedRegs
, since the old version passed through the same group oftai_regalloc
entries for each type of register. The patch will now only pass through the group once. -
UpdateUsedRegsBetween
has been rewritten to be likeUpdateUsedRegs
in that it only passes through the entries once. - While the regular
UpdateUsedRegs
could be inlined and made to just callUpdateUsedRegs(UsedRegs, p)
, this was decided against in order to reduce register pressure.
System
- Processor architecture: All (any that use a peephole optimizer)
What is the current bug behavior?
N/A
What is the behavior after applying this patch?
Compilation should now be slightly faster
Relevant logs and/or screenshots
This is a pure refactor - other than some changes to the compiler binaries, there should be no additions.
The timings are hard to measure accurately under Windows, since "make" is less efficient on that platform, but using PowerShell's "Measure-Command" instruction for "make clean all" under x86_64-win64:
Timings - O2:
Trunk Patch
---- -----
3:53.438 3:40.228
3:57.470 3:42.494
3:53.929 3:44.954
Timings - 03:
Trunk Patch
---- -----
3:42.299 3:47.821
3:49.973 3:50.876
3:50.667 3:54.320
On my x86_64-linux virtual machine though, the improvements are much more decisive (timing tool only had centisecond-precision)
Timings - O2:
Trunk Patch
---- -----
3:16.16 3:02.20
3:16.25 3:06.69
3:15.39 3:01.44
Timings - 03:
Trunk Patch
---- -----
3:28.38 3:11.43
3.28.27 3:13.43
3:26.03 3:12.91