Skip to content

Faster generic fpc_varset_set_range (used for [L .. R] set constructors when L or R aren’t constants).

Rika requested to merge runewalsh/source:setrange into main

This speeds up genset.inc:fpc_varset_set_range which was the original subject of complaint in !469 (merged). Separate from !469 (merged) because I’m unsure if it works on big-endian platforms, please check someone 😭. For now, I assume big-endian sets have same order of bytes but reversed order of bits in each byte, according to this comment (what’s the rationale behind this design btw?).

Benchmark: SetRangeBenchmark.pas.

My results

x86-64

fpc_varset_set_range_orig(len = 1):       4.2 ns/call
fpc_varset_set_range_new (len = 1):       5.1 ns/call

fpc_varset_set_range_orig(len = 8~12):    16 ns/call
fpc_varset_set_range_new (len = 8~12):    9.3 ns/call

fpc_varset_set_range_orig(len = 15~25):   26 ns/call
fpc_varset_set_range_new (len = 15~25):   7.5 ns/call

fpc_varset_set_range_orig(len = 30~60):   49 ns/call
fpc_varset_set_range_new (len = 30~60):   8.6 ns/call

fpc_varset_set_range_orig(len = 100~255): 168 ns/call
fpc_varset_set_range_new (len = 100~255): 8.9 ns/call

i386

fpc_varset_set_range_orig(len = 1):       5.9 ns/call
fpc_varset_set_range_new (len = 1):       6.2 ns/call

fpc_varset_set_range_orig(len = 8~12):    18 ns/call
fpc_varset_set_range_new (len = 8~12):    10 ns/call

fpc_varset_set_range_orig(len = 15~25):   28 ns/call
fpc_varset_set_range_new (len = 15~25):   12 ns/call

fpc_varset_set_range_orig(len = 30~60):   53 ns/call
fpc_varset_set_range_new (len = 30~60):   15 ns/call

fpc_varset_set_range_orig(len = 100~255): 185 ns/call
fpc_varset_set_range_new (len = 100~255): 21 ns/call

Merge request reports