Draft: [Refactor] Converted internal CSE list objects to a linked list for faster processing (!725) · Merge requests · FPC / FPC / FPC Source

J. Gareth "Kit" Moreton requested to merge CuriousKit/optimisations:cse-improvement into main Jul 21, 2024

Summary

This merge request refactors the CSE algorithm by replacing the numerous list objects with a customised linked list (it uses custom-made records and the use of New and Dispose; no objects are used). This helps to reduce the memory footprint as well as reduce the number of potential time penalties from reallocation, although this is very much system-dependent.

The main objective behind this refactor is to provide a better framework for future CSE improvements, where a linked list lends itself to better performance since it is generally stepped through linearly, and specific index lookups can be replaced with direct pointer references (the equalto field). But more importantly, it is much better-suited when cherry-picking specific subexpressions and removing them from the list if they are no longer suitable for CSE (e.g. because one of its input variables changed value).

System

Processor architecture: Cross-platform

Notes

This is a pure refactor. The only output that should change is the optcse unit.

Under x86_64-win64 -O4, the binary size is reduced by 512 bytes, from 3,969,536 bytes to 3,969,024 bytes.

I attempted to take some time measurements for this merge request to gauge performance improvements. Because Windows is generally unreliable in this regard due to how slow it is in starting new processes, I opted to use arm-linux as a bench test. These are the results from running make clean all OPT="-a -O4":

Patch
-----

Binary size: 4214620

real	13m19.537s
user	9m59.143s
sys	3m1.711s

real	13m20.950s
user	9m58.520s
sys	3m2.694s

real	13m10.416s
user	9m51.311s
sys	3m4.602s

Trunk
-----
Binary size: 4214620

real	13m34.558s
user	10m1.501s
sys	3m1.903s

real	13m17.470s
user	9m56.555s
sys	3m1.495s

real	13m19.082s
user	10m0.142s
sys	3m2.685s

real	13m33.879s
user	10m5.639s
sys	3m3.997s

real	13m37.954s
user	10m10.863s
sys	3m7.014s

I ran the timing test for main more times because I was concerned that the longer times were outliers, but it seems it's more common than the times that are similar to the refactor times. Ultimately the time saving is not that pronounced currently (and the binary size doesn't change), but as explained earlier, a linked list lends itself better for the purposes of CSE.

Draft: [Refactor] Converted internal CSE list objects to a linked list for faster processing

Summary

System

Notes

Merge request reports