Tweak pasin_float, fix psqrt_complex
Reference issue
Fixes #2597 (closed)
pasin_float
:
Swapped out a comparison for some bit flipping and some other minor optimizations. This reduces runtime by ~11% (AVX).
size | before | after | diff |
---|---|---|---|
32 | 1080 | 949 | -12% |
64 | 1058 | 992 | -6% |
128 | 1089 | 912 | -16% |
256 | 1127 | 889 | -21% |
512 | 1086 | 882 | -18% |
1024 | 1014 | 845 | -16% |
2048 | 1223 | 952 | -22% |
4096 | 1125 | 856 | -23% |
8192 | 1270 | 1018 | -19% |
16384 | 1133 | 841 | -25% |
32768 | 1129 | 1021 | -9% |
65536 | 1067 | 880 | -17% |
131072 | 1145 | 861 | -24% |
262144 | 1125 | 982 | -12% |
524288 | 1199 | 937 | -21% |
1048576 | 1460 | 929 | -36% |
2097152 | 1220 | 1042 | -14% |
4194304 | 1431 | 1166 | -18% |
8388608 | 1885 | 1195 | -36% |
16777216 | 1798 | 1275 | -29% |
33554432 | 1485 | 1137 | -23% |
67108864 | 1373 | 1112 | -19% |
134217728 | 1338 | 1149 | -14% |
psqrt_complex
: Fixed error handling where, unless otherwise specified, if either the real or imaginary component is nan
, then the result is nan
. This is addressed before handling the special infinity cases. Overall, it is slower.
https://godbolt.org/z/vneGGcGjc
size | before | after | diff |
---|---|---|---|
32 | 464 | 527 | 13% |
64 | 513 | 567 | 10% |
128 | 498 | 544 | 9% |
256 | 499 | 563 | 12% |
512 | 492 | 514 | 4% |
1024 | 605 | 636 | 5% |
2048 | 468 | 540 | 15% |
4096 | 551 | 612 | 11% |
8192 | 467 | 548 | 17% |
16384 | 510 | 612 | 20% |
32768 | 466 | 571 | 22% |
65536 | 517 | 542 | 4% |
131072 | 520 | 608 | 16% |
262144 | 520 | 537 | 3% |
524288 | 510 | 577 | 13% |
1048576 | 497 | 589 | 18% |
2097152 | 505 | 583 | 15% |
4194304 | 551 | 566 | 2% |
8388608 | 514 | 600 | 16% |
16777216 | 514 | 511 | 0% |
What does this implement/fix?
Additional information
Edited by Charles Schlosser