Micro-optimizations to nutils.pas. (!139) · Merge requests · FPC / FPC / FPC Source

Rika requested to merge runewalsh/source:nutils into main Jan 12, 2022

This applies optimizations described in #39502. The most notable is shortcutting node_count in the sense of not actually being a “micro”, though the actual effect will be on the level of that of the foreachnode as node_count is used much less (node_count usages are by themselves a subset of foreachnode usages as node_count uses foreachnode; and are a very small subset) and human-written functions or loops aren’t that big anyway (but still often enough bigger than typical node_count comparand).

On my code, and according to my highly accurate (←irony) measurements with QueryPerformanceCounter:

foreachnode speeds up by 6% which translates to 0.4% speedup to the whole compilation as it is used heavily,
node_count speeds up by two times (1.5 mcs to 750 ns), and I also added a shortcut (see heuristics_favors_autoinlining at psub.pas) that reduces its calls made by autoinlining by another three times. In absolute numbers, this reduces their total running time from 15 ms to 2.5, or the whole -OoAUTOINLINE compilation by 0.2%.

Regarding foreachnode, in addition to what I did before I have now removed the second foreachnode version outright, instead using an adapter of foreachnodefunction to staticforeachnodefunction, so this also spectacularly reduces code duplication.

I suspect that foreachnode’s raisen, tryfinallyn and tempcreaten branches can reuse caller’s arguments like all other branches instead of redirecting to pm_postprocess and so call self.perform instead of foreachnode too, but don’t want to risk.

Edited Jan 14, 2022 by Rika

Micro-optimizations to nutils.pas.

Merge request reports