Skip to content

Fix clippy warnings and make get_subvalue faster for 4x32b on aarch64.

Tobias Bergkvist requested to merge tobias/get-subvalue-aarch64 into main

For some reason using debug_assert! instead of assert! makes the 4x32b get_subvalue alomst 3x faster, while not affecting any other cases.

packed_128/PackedBinaryField128x1b
                        time:   [6.9183 ns 6.9781 ns 7.0296 ns]
                        thrpt:  [4.5522 Gelem/s 4.5858 Gelem/s 4.6254 Gelem/s]
                 change:
                        time:   [-0.8306% -0.1802% +0.4135%] (p = 0.57 > 0.05)
                        thrpt:  [-0.4118% +0.1805% +0.8376%]
                        No change in performance detected.
Found 10 outliers among 100 measurements (10.00%)
  6 (6.00%) low severe
  3 (3.00%) low mild
  1 (1.00%) high severe
packed_128/PackedBinaryField16x8b
                        time:   [8.3940 ns 8.4147 ns 8.4357 ns]
                        thrpt:  [3.7934 Gelem/s 3.8029 Gelem/s 3.8122 Gelem/s]
                 change:
                        time:   [+0.4602% +0.9139% +1.3541%] (p = 0.00 < 0.05)
                        thrpt:  [-1.3360% -0.9057% -0.4581%]
                        Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
packed_128/PackedBinaryField4x32b
                        time:   [7.7293 ns 7.8239 ns 7.9940 ns]
                        thrpt:  [4.0030 Gelem/s 4.0900 Gelem/s 4.1401 Gelem/s]
                 change:
                        time:   [-62.874% -62.530% -62.015%] (p = 0.00 < 0.05)
                        thrpt:  [+163.26% +166.88% +169.35%]
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  3 (3.00%) low mild
  3 (3.00%) high severe
packed_128/PackedBinaryField2x64b
                        time:   [8.1764 ns 8.2039 ns 8.2309 ns]
                        thrpt:  [3.8878 Gelem/s 3.9006 Gelem/s 3.9137 Gelem/s]
                 change:
                        time:   [+0.1504% +0.5846% +1.0020%] (p = 0.01 < 0.05)
                        thrpt:  [-0.9921% -0.5812% -0.1502%]
                        Change within noise threshold.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) low mild
packed_128/PackedBinaryField1x128b
                        time:   [7.3937 ns 7.4265 ns 7.4586 ns]
                        thrpt:  [4.2904 Gelem/s 4.3089 Gelem/s 4.3280 Gelem/s]
                 change:
                        time:   [-0.0134% +0.7602% +1.4971%] (p = 0.05 < 0.05)
                        thrpt:  [-1.4751% -0.7544% +0.0134%]
                        Change within noise threshold.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) low mild
  1 (1.00%) high mild

Merge request reports