Skip to content

Do some specialized copypaste for RTTI functions over arrays.

Rika requested to merge runewalsh/source:batch_arrays into main

Work in type-specific batches in fpc_initialize_array / fpc_finalize_array / fpc_addref_array, instead of redirecting them to N calls to fpc_initialize / fpc_finalize / fpc_addref. This might be faster at the cost of some duplicated logic, especially for initializing arrays of managed pointers which reduces to a single FillChar instead of repeating “call fpc_initialize, dispatch over TTypeKind, assign that miserable nil finally” for each element.

Benchmark: ManagementBenchmark.pas.

It attempts both to measure raw initialization/finalization overhead and to emulate some useful work to show its relative contribution.

My results (x86-64, averages from 20 minimums of 50 runs):

                                          specialized array functions; actual proposal, almost not a joke
                                            |
                                            |    also avoid initialization after zeroing; partly joke
                                            |      |
                                            |      |    also initialize fields in batches; mostly joke
                                            ↓      ↓      ↓
                                  before   com1   com2   com3
StaticTableInitFinal:               543     173    155    162  ns/call
PrettyMultiplicationTableStatic:   18.6    18.2   18.0   18.1  mcs/call

DynamicTableInitFinal:              828     636    627    652  ns/call
PrettyMultiplicationTableDynamic:  18.8    18.7   18.4   18.5  mcs/call

StringPairArrayInitFinal:           259     199    150    100  ns/call
VisualizeSomeStringPairs:          3.00    2.98   2.90   2.95  mcs/call

String10x10InitFinal:               743     612    437    280  ns/call
String10x10LightWork:              1.87    1.79   1.65   1.37  mcs/call
String10x10HarderWork:             6.35    6.22   6.09   5.85  mcs/call
String100x10InitFinal:             6.67    5.23   3.50   1.61  mcs/call
String100x10LightWork:             17.7    16.9   15.5   12.1  mcs/call
String100x10HarderWork:            89.1    88.8   86.2   84.5  mcs/call

LoneRecordInitFinal:               22.3    22.0   22.0   22.0  ns/call

Unimpressive for quite a bit of extra code (+120 / 160 / 350 LoC for 1 / 2 / 3 commits)? Maybe.

Further idea: array of record can be handled field by field, so if the field #4 is a managed pointer, initialize all of them to nils, etc. Not necessarily a good idea to begin with (ideally you could have two versions, record-wise and field-wise, and switch between them depending on the array size and the record field count in each particular case), and right management operators in nested records can turn it into a breaking one, but it shows the new possibilities and space of new solutions that batch processing makes available for future considerations at all.

Edited by Rika

Merge request reports

Loading