Faster large set operations. (!469) · Merge requests · FPC / FPC / FPC Source

Rika requested to merge runewalsh/source:setgenops into main Aug 02, 2023

This complaint matches my experience. Set functions can and should be improved.

Let’s start with something simple, generic and endianness-agnostic. (As long as I don’t know if large sets work differently on big-endian platforms. Probably not, i.e. ith bit is always pByte(@theset)[i div 8] shr (i mod 8) and 1?)

This MR greatly improves generic implementations of +, −, *, ><, =, <= over large sets on platforms that allow unaligned accesses, in particular both on i386 and x86-64. (I suspect that these functions 1) never get arguments not aligned on a PtrUint boundary when size >= sizeof(PtrUint), 2) never have tails I handle, but without a guarantee I’m playing safe.)

Benchmark: SetOpsBenchmark.pas.

My results

i386

set + set (system):    20 ns/call
set + set (orig):      17 ns/call
set + set (new):       5.7 ns/call

set = set (system):    18 ns/call
set = set (orig):      18 ns/call
set = set (new):       3.9 ns/call

set <= set (system):   74 ns/call
set <= set (orig):     73 ns/call
set <= set (new):      4.3 ns/call

x86-64

set + set (system):    17 ns/call
set + set (orig):      16 ns/call
set + set (new):       3.6 ns/call

set = set (system):    13 ns/call
set = set (orig):      17 ns/call
set = set (new):       4.0 ns/call

set <= set (system):   24 ns/call
set <= set (orig):     24 ns/call
set <= set (new):      2.7 ns/call

Edited Aug 02, 2023 by Rika

Faster large set operations.

Merge request reports