Micro-optimizations to nutils.pas.
This applies optimizations described in #39502. The most notable is shortcutting node_count
in the sense of not actually being a “micro”, though the actual effect will be on the level of that of the foreachnode
as node_count
is used much less (node_count
usages are by themselves a subset of foreachnode
usages as node_count
uses foreachnode
; and are a very small subset) and human-written functions or loops aren’t that big anyway (but still often enough bigger than typical node_count
comparand).
On my code, and according to my highly accurate (←irony) measurements with QueryPerformanceCounter
:
-
foreachnode
speeds up by 6% which translates to 0.4% speedup to the whole compilation as it is used heavily, -
node_count
speeds up by two times (1.5 mcs to 750 ns), and I also added a shortcut (seeheuristics_favors_autoinlining
atpsub.pas
) that reduces its calls made by autoinlining by another three times. In absolute numbers, this reduces their total running time from 15 ms to 2.5, or the whole-OoAUTOINLINE
compilation by 0.2%.
Regarding foreachnode
, in addition to what I did before I have now removed the second foreachnode
version outright, instead using an adapter of foreachnodefunction
to staticforeachnodefunction
, so this also spectacularly reduces code duplication.
I suspect that foreachnode
’s raisen
, tryfinallyn
and tempcreaten
branches can reuse caller’s arguments like all other branches instead of redirecting to pm_postprocess
and so call self.perform
instead of foreachnode
too, but don’t want to risk.