Draft: [Refactor] Converted internal CSE list objects to a linked list for faster processing
Summary
This merge request refactors the CSE algorithm by replacing the numerous list objects with a customised linked list (it uses custom-made records and the use of New
and Dispose
; no objects are used). This helps to reduce the memory footprint as well as reduce the number of potential time penalties from reallocation, although this is very much system-dependent.
The main objective behind this refactor is to provide a better framework for future CSE improvements, where a linked list lends itself to better performance since it is generally stepped through linearly, and specific index lookups can be replaced with direct pointer references (the equalto
field). But more importantly, it is much better-suited when cherry-picking specific subexpressions and removing them from the list if they are no longer suitable for CSE (e.g. because one of its input variables changed value).
System
- Processor architecture: Cross-platform
Notes
This is a pure refactor. The only output that should change is the optcse
unit.
Under x86_64-win64 -O4
, the binary size is reduced by 512 bytes, from 3,969,536 bytes to 3,969,024 bytes.
I attempted to take some time measurements for this merge request to gauge performance improvements. Because Windows is generally unreliable in this regard due to how slow it is in starting new processes, I opted to use arm-linux
as a bench test. These are the results from running make clean all OPT="-a -O4"
:
Patch
-----
Binary size: 4214620
real 13m19.537s
user 9m59.143s
sys 3m1.711s
real 13m20.950s
user 9m58.520s
sys 3m2.694s
real 13m10.416s
user 9m51.311s
sys 3m4.602s
Trunk
-----
Binary size: 4214620
real 13m34.558s
user 10m1.501s
sys 3m1.903s
real 13m17.470s
user 9m56.555s
sys 3m1.495s
real 13m19.082s
user 10m0.142s
sys 3m2.685s
real 13m33.879s
user 10m5.639s
sys 3m3.997s
real 13m37.954s
user 10m10.863s
sys 3m7.014s
I ran the timing test for main
more times because I was concerned that the longer times were outliers, but it seems it's more common than the times that are similar to the refactor times. Ultimately the time saving is not that pronounced currently (and the binary size doesn't change), but as explained earlier, a linked list lends itself better for the purposes of CSE.