Improve generic Index*.
I followed one of my relatively hot paths and noticed a signed division produced by subtracting typed pointers in generic `IndexQWord` (`x86-64` uses it, as well as `IndexDWord`). Hence, I have four ideas about generic `Index*` implementations, from least to most dubious.
1. Force unsigned division.
Patch: [index_v1.patch](/uploads/bd76472a2c9f3969d1d12c2fa3cdbcc6/index_v1.patch).
2. Do the above plus force `(x >= CONST1) and (x <= CONST2)` → `unsigned(x - CONST1) <= CONST2 - CONST1` optimization (unless #39306 fixed).
Patch: [index_v2.patch](/uploads/4d173a4e8d1bae3c749b5f950b3239a0/index_v2.patch), incremental from v1: [index_v1to2.patch](/uploads/bfed3357ca39b38510a53178232ad35d/index_v1to2.patch).
3. Do everything of the above plus speculate `pend` calculation (which is also used in the condition).
Patch: [index_v3.patch](/uploads/9699d74cad771adfc65989850ec6f765/index_v3.patch), incremental from v2: [index_v2to3.patch](/uploads/a2a85a7acc2417b1b3734f59567fc028/index_v2to3.patch).
4. Do everything of the above plus tighten loops, at the cost of the extra comparison at the end.
Patch: [index_v4.patch](/uploads/b735274d584ad04308726eecd2007a77/index_v4.patch), incremental from v3: [index_v3to4.patch](/uploads/00519a142fad2c53dc39b31596deae3f/index_v3to4.patch).
I don’t have meaningful benchmarks though (or, rather, I see 2× speedup on x86-64 both for `IndexDWord` and `IndexQWord` **mainly due to #4**, but don’t want to believe it so easily). My results with [index_benchmark.pas](/uploads/42c76a24a135bac6ae9d3831eae6aaef/index_benchmark.pas) on
x86-32 (`System.IndexByte`, `System.IndexWord` and `System.IndexDWord` are special)
```
System.IndexByte IndexByteV1 (80 b code) IndexByteV4 (96 b code)
n=1: 3.3 ns/call n=1: 2.9 ns/call n=1: 2.5 ns/call
n=10: 3.7 ns/call n=10: 7.7 ns/call n=10: 3.7 ns/call
n=100: 10.0 ns/call n=100: 53.6 ns/call n=100: 24.6 ns/call
System.IndexWord IndexWordV1 (96 b code) IndexWordV4 (80 b code)
n=1: 15.7 ns/call n=1: 3.2 ns/call n=1: 2.7 ns/call
n=10: 18.2 ns/call n=10: 5.9 ns/call n=10: 4.2 ns/call
n=100: 39.7 ns/call n=100: 32.3 ns/call n=100: 25.4 ns/call
System.IndexDWord IndexDWordV1 (112 b code) IndexDWordV4 (96 b code)
n=1: 15.7 ns/call n=1: 2.9 ns/call n=1: 2.8 ns/call
n=10: 17.9 ns/call n=10: 6.0 ns/call n=10: 4.9 ns/call
n=100: 39.5 ns/call n=100: 34.3 ns/call n=100: 33.4 ns/call
System.IndexQWord IndexQWordV1 (128 b code) IndexQWordV4 (112 b code)
n=1: 3.7 ns/call n=1: 3.5 ns/call n=1: 3.8 ns/call
n=10: 5.7 ns/call n=10: 5.4 ns/call n=10: 6.1 ns/call
n=100: 38.5 ns/call n=100: 38.4 ns/call n=100: 33.8 ns/call
```
x86-64 (`System.IndexByte` and `System.IndexWord` are special)
```
System.IndexByte IndexByteV1 (96 b code) IndexByteV4 (80 b code)
n=1: 3.3 ns/call n=1: 2.5 ns/call n=1: 2.4 ns/call
n=10: 2.8 ns/call n=10: 5.0 ns/call n=10: 3.3 ns/call
n=100: 4.6 ns/call n=100: 33.8 ns/call n=100: 23.0 ns/call
System.IndexWord IndexWordV1 (96 b code) IndexWordV4 (80 b code)
n=1: 3.7 ns/call n=1: 3.0 ns/call n=1: 2.3 ns/call
n=10: 3.6 ns/call n=10: 6.4 ns/call n=10: 3.6 ns/call
n=100: 6.7 ns/call n=100: 45.1 ns/call n=100: 23.0 ns/call
System.IndexDWord IndexDWordV1 (112 b code) IndexDWordV4 (96 b code)
n=1: 3.2 ns/call n=1: 3.2 ns/call n=1: 3.2 ns/call
n=10: 6.2 ns/call n=10: 6.5 ns/call n=10: 3.7 ns/call
n=100: 45.7 ns/call n=100: 44.2 ns/call n=100: 23.8 ns/call
System.IndexQWord IndexQWordV1 (112 b code) IndexQWordV4 (96 b code)
n=1: 2.9 ns/call n=1: 3.2 ns/call n=1: 2.5 ns/call
n=10: 6.2 ns/call n=10: 6.8 ns/call n=10: 4.0 ns/call
n=100: 44.8 ns/call n=100: 45.2 ns/call n=100: 25.7 ns/call
```
issue