Skip to content

[Packages] Improvement to TArrayHelper's sorting algorithm

Summary

This merge request improves the sorting algorithm (Quicksort with Heapsort if it becomes degenerate) by switching to Insertion Sort once the partition size falls below a certain threshold (set to 10 currently). This allows it to run slightly faster due to reduced overhead. This brings the algorithm more in line with Introsort.

What is the current bug behavior?

N/A

What is the behavior after applying this patch?

The TArrayHelper<>.Sort routine gains a slight speed gain.

Relevant logs and/or screenshots

Using tqsort_killer as a benchmark:

--- strings ---

WITHOUT INSERTION SORT                              WITH INSERTION SORT
----------------------                              -------------------

Order: random.
n = 1500:   2.5 ms/sort, OK                         2.0 ms/sort, OK
n = 3000:   5.2 ms/sort (2.1x from previous), OK    4.2 ms/sort (2.1x from previous), OK
n = 6000:   11.8 ms/sort (2.3x from previous), OK   10.2 ms/sort (2.5x from previous), OK

Order: QSort killer.
n = 1500:   5.8 ms/sort, OK                         5.5 ms/sort, OK
n = 3000:   12.4 ms/sort (2.2x from previous), OK   12.4 ms/sort (2.2x from previous), OK
n = 6000:   27.6 ms/sort (2.2x from previous), OK   26.6 ms/sort (2.2x from previous), OK

--- float32's ---

WITHOUT INSERTION SORT                  	    WITH INSERTION SORT
----------------------                              -------------------

Order: random.
n = 10000:  0.9 ms/sort, OK                         0.8 ms/sort, OK
n = 20000:  2.0 ms/sort (2.1x from previous), OK    1.7 ms/sort (2.1x from previous), OK
n = 40000:  4.0 ms/sort (2.0x from previous), OK    3.7 ms/sort (2.2x from previous), OK

Order: QSort killer.
n = 10000:  1.4 ms/sort, OK                         1.4 ms/sort, OK
n = 20000:  3.1 ms/sort (2.2x from previous), OK    3.0 ms/sort (2.1x from previous), OK
n = 40000:  6.6 ms/sort (2.1x from previous), OK    6.5 ms/sort (2.1x from previous), OK

Times are about the same for float32, but an improvement is seen with strings. Under heavy CPU loads, the improvement is more pronounced:

--- strings ---

WITHOUT INSERTION SORT                  	    WITH INSERTION SORT
----------------------                  	    -------------------

Order: random.
n = 1500:   5.7 ms/sort, OK             	    3.4 ms/sort, OK
n = 3000:   12.2 ms/sort (2.1x from previous), OK   9.3 ms/sort (2.7x from previous), OK
n = 6000:   29.4 ms/sort (2.4x from previous), OK   20.4 ms/sort (2.2x from previous), OK

Order: QSort killer.
n = 1500:   14.5 ms/sort, OK                	    9.9 ms/sort, OK
n = 3000:   32.6 ms/sort (2.2x from previous), OK   23.2 ms/sort (2.4x from previous), OK
n = 6000:   83.3 ms/sort (2.6x from previous), OK   57.7 ms/sort (2.5x from previous), OK

--- float32's ---

WITHOUT INSERTION SORT                  	    WITH INSERTION SORT
----------------------                  	    -------------------

Order: random.
n = 10000:  1.5 ms/sort, OK          	            1.2 ms/sort, OK
n = 20000:  3.2 ms/sort (2.2x from previous), OK    2.6 ms/sort (2.1x from previous), OK
n = 40000:  6.7 ms/sort (2.1x from previous), OK    6.0 ms/sort (2.3x from previous), OK

Order: QSort killer.
n = 10000:  2.7 ms/sort, OK            	            2.4 ms/sort, OK
n = 20000:  5.6 ms/sort (2.1x from previous), OK    5.1 ms/sort (2.1x from previous), OK
n = 40000:  11.9 ms/sort (2.1x from previous), OK   11.5 ms/sort (2.3x from previous), OK
Edited by J. Gareth "Kit" Moreton

Merge request reports