Skip to content

Improve generic Index*.

I followed one of my relatively hot paths and noticed a signed division produced by subtracting typed pointers in generic IndexQWord (x86-64 uses it, as well as IndexDWord). Hence, I have four ideas about generic Index* implementations, from least to most dubious.

  1. Force unsigned division.

Patch: index_v1.patch.

  1. Do the above plus force (x >= CONST1) and (x <= CONST2)unsigned(x - CONST1) <= CONST2 - CONST1 optimization (unless #39306 (closed) fixed).

Patch: index_v2.patch, incremental from v1: index_v1to2.patch.

  1. Do everything of the above plus speculate pend calculation (which is also used in the condition).

Patch: index_v3.patch, incremental from v2: index_v2to3.patch.

  1. Do everything of the above plus tighten loops, at the cost of the extra comparison at the end.

Patch: index_v4.patch, incremental from v3: index_v3to4.patch.

I don’t have meaningful benchmarks though (or, rather, I see 2× speedup on x86-64 both for IndexDWord and IndexQWord mainly due to #4, but don’t want to believe it so easily). My results with index_benchmark.pas on

x86-32 (System.IndexByte, System.IndexWord and System.IndexDWord are special)

System.IndexByte       IndexByteV1 (80 b code)    IndexByteV4 (96 b code)
n=1:   3.3 ns/call     n=1:   2.9 ns/call         n=1:   2.5 ns/call
n=10:  3.7 ns/call     n=10:  7.7 ns/call         n=10:  3.7 ns/call
n=100: 10.0 ns/call    n=100: 53.6 ns/call        n=100: 24.6 ns/call

System.IndexWord       IndexWordV1 (96 b code)    IndexWordV4 (80 b code)
n=1:   15.7 ns/call    n=1:   3.2 ns/call         n=1:   2.7 ns/call
n=10:  18.2 ns/call    n=10:  5.9 ns/call         n=10:  4.2 ns/call
n=100: 39.7 ns/call    n=100: 32.3 ns/call        n=100: 25.4 ns/call

System.IndexDWord      IndexDWordV1 (112 b code)  IndexDWordV4 (96 b code)
n=1:   15.7 ns/call    n=1:   2.9 ns/call         n=1:   2.8 ns/call
n=10:  17.9 ns/call    n=10:  6.0 ns/call         n=10:  4.9 ns/call
n=100: 39.5 ns/call    n=100: 34.3 ns/call        n=100: 33.4 ns/call

System.IndexQWord      IndexQWordV1 (128 b code)  IndexQWordV4 (112 b code)
n=1:   3.7 ns/call     n=1:   3.5 ns/call         n=1:   3.8 ns/call
n=10:  5.7 ns/call     n=10:  5.4 ns/call         n=10:  6.1 ns/call
n=100: 38.5 ns/call    n=100: 38.4 ns/call        n=100: 33.8 ns/call

x86-64 (System.IndexByte and System.IndexWord are special)

System.IndexByte       IndexByteV1 (96 b code)    IndexByteV4 (80 b code)
n=1:   3.3 ns/call     n=1:   2.5 ns/call         n=1:   2.4 ns/call
n=10:  2.8 ns/call     n=10:  5.0 ns/call         n=10:  3.3 ns/call
n=100: 4.6 ns/call     n=100: 33.8 ns/call        n=100: 23.0 ns/call

System.IndexWord       IndexWordV1 (96 b code)    IndexWordV4 (80 b code)
n=1:   3.7 ns/call     n=1:   3.0 ns/call         n=1:   2.3 ns/call
n=10:  3.6 ns/call     n=10:  6.4 ns/call         n=10:  3.6 ns/call
n=100: 6.7 ns/call     n=100: 45.1 ns/call        n=100: 23.0 ns/call

System.IndexDWord      IndexDWordV1 (112 b code)  IndexDWordV4 (96 b code)
n=1:   3.2 ns/call     n=1:   3.2 ns/call         n=1:   3.2 ns/call
n=10:  6.2 ns/call     n=10:  6.5 ns/call         n=10:  3.7 ns/call
n=100: 45.7 ns/call    n=100: 44.2 ns/call        n=100: 23.8 ns/call

System.IndexQWord      IndexQWordV1 (112 b code)  IndexQWordV4 (96 b code)
n=1:   2.9 ns/call     n=1:   3.2 ns/call         n=1:   2.5 ns/call
n=10:  6.2 ns/call     n=10:  6.8 ns/call         n=10:  4.0 ns/call
n=100: 44.8 ns/call    n=100: 45.2 ns/call        n=100: 25.7 ns/call
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information