Skip to content

Modified sqrt/rsqrt for denormal handling.

This updates the new generic sqrt/rsqrt implementation after !868 (merged) to account for the following:

  • Better handling of std::numeric_limits<T>::denorm_min() (the original incorrectly returns NaN for AVX512)
  • Better handling of denormals in general (will often give correct answers rather than flushing to 0/inf)
  • Faster sqrt and rsqrt for AVX512 (but slightly slower rsqrt for SSE, AVX had no change)

Google benchmark numbers (only significant changes shown):

Comparing ./sqrt_old_sse4.2 to ./sqrt_new_sse4.2
Benchmark                                 Time             CPU      Time Old      Time New       CPU Old       CPU New
----------------------------------------------------------------------------------------------------------------------
BM_Rsqrt<float>/8/1                    +0.1165         +0.1165             5             5             5             5
BM_Rsqrt<float>/64/1                   +0.1355         +0.1355            25            28            25            28
BM_Rsqrt<float>/512/1                  +0.1340         +0.1340           195           221           195           221
BM_Rsqrt<float>/2048/1                 +0.0715         +0.0714          1016          1089          1016          1089

Comparing ./sqrt_old_avx512dq to ./sqrt_new_avx512dq
Benchmark                                 Time             CPU      Time Old      Time New       CPU Old       CPU New
----------------------------------------------------------------------------------------------------------------------
BM_Sqrt<float>/8/1                     -0.0226         -0.0226             9             8             9             8
BM_Sqrt<float>/64/1                    -0.3050         -0.3050            14             9            14             9
BM_Sqrt<float>/512/1                   -0.3282         -0.3282           104            70           104            70
BM_Sqrt<float>/2048/1                  -0.2790         -0.2790           469           338           469           338
BM_Sqrt<double>/8/1                    -0.1990         -0.1990             5             4             5             4
BM_Sqrt<double>/64/1                   -0.2366         -0.2366            34            26            34            26
BM_Sqrt<double>/512/1                  -0.2236         -0.2236           313           243           313           243
BM_Sqrt<double>/2048/1                 -0.2237         -0.2237          1287           999          1287           999
BM_Rsqrt<float>/8/1                    +0.0166         +0.0165             5             5             5             5
BM_Rsqrt<float>/64/1                   -0.0715         -0.0715            11            10            11            10
BM_Rsqrt<float>/512/1                  -0.1097         -0.1097            82            73            82            73
BM_Rsqrt<float>/2048/1                 -0.1323         -0.1323           387           335           387           335
BM_Rsqrt<double>/8/1                   -0.0874         -0.0874             5             5             5             5
BM_Rsqrt<double>/64/1                  -0.1198         -0.1198            31            27            31            27
BM_Rsqrt<double>/512/1                 -0.1499         -0.1499           287           244           287           244
BM_Rsqrt<double>/2048/1                -0.1728         -0.1727          1181           977          1181           977
OVERALL_GEOMEAN                        -0.1616         -0.1616             0             0             0             0
Edited by Antonio Sánchez

Merge request reports

Loading