Do some specialized copypaste for RTTI functions over arrays. (!383) · Merge requests · FPC / FPC / FPC Source

Work in type-specific batches in fpc_initialize_array / fpc_finalize_array / fpc_addref_array, instead of redirecting them to N calls to fpc_initialize / fpc_finalize / fpc_addref. This might be faster at the cost of some duplicated logic, especially for initializing arrays of managed pointers which reduces to a single FillChar instead of repeating “call fpc_initialize, dispatch over TTypeKind, assign that miserable nil finally” for each element.

Benchmark: ManagementBenchmark.pas.

It attempts both to measure raw initialization/finalization overhead and to emulate some useful work to show its relative contribution.

My results (x86-64, averages from 20 minimums of 50 runs):

                                          specialized array functions; actual proposal, almost not a joke
                                            |
                                            |    also avoid initialization after zeroing; partly joke
                                            |      |
                                            |      |    also initialize fields in batches; mostly joke
                                            ↓      ↓      ↓
                                  before   com1   com2   com3
StaticTableInitFinal:               543     173    155    162  ns/call
PrettyMultiplicationTableStatic:   18.6    18.2   18.0   18.1  mcs/call

DynamicTableInitFinal:              828     636    627    652  ns/call
PrettyMultiplicationTableDynamic:  18.8    18.7   18.4   18.5  mcs/call

StringPairArrayInitFinal:           259     199    150    100  ns/call
VisualizeSomeStringPairs:          3.00    2.98   2.90   2.95  mcs/call

String10x10InitFinal:               743     612    437    280  ns/call
String10x10LightWork:              1.87    1.79   1.65   1.37  mcs/call
String10x10HarderWork:             6.35    6.22   6.09   5.85  mcs/call
String100x10InitFinal:             6.67    5.23   3.50   1.61  mcs/call
String100x10LightWork:             17.7    16.9   15.5   12.1  mcs/call
String100x10HarderWork:            89.1    88.8   86.2   84.5  mcs/call

LoneRecordInitFinal:               22.3    22.0   22.0   22.0  ns/call

Unimpressive for quite a bit of extra code (+120 / 160 / 350 LoC for 1 / 2 / 3 commits)? Maybe.

Further idea: array of record can be handled field by field, so if the field #4 is a managed pointer, initialize all of them to nils, etc. Not necessarily a good idea to begin with (ideally you could have two versions, record-wise and field-wise, and switch between them depending on the array size and the record field count in each particular case), and right management operators in nested records can turn it into a breaking one, but it shows the new possibilities and space of new solutions that batch processing makes available for future considerations at all.

Edited May 12, 2023 by Rika

Do some specialized copypaste for RTTI functions over arrays.

Merge request reports