Tweak pasin_float, fix psqrt_complex
Reference issue
Fixes #2597 (closed)
pasin_float:
Swapped out a comparison for some bit flipping and some other minor optimizations. This reduces runtime by ~11% (AVX).
| size | before | after | diff |
|---|---|---|---|
| 32 | 1080 | 949 | -12% |
| 64 | 1058 | 992 | -6% |
| 128 | 1089 | 912 | -16% |
| 256 | 1127 | 889 | -21% |
| 512 | 1086 | 882 | -18% |
| 1024 | 1014 | 845 | -16% |
| 2048 | 1223 | 952 | -22% |
| 4096 | 1125 | 856 | -23% |
| 8192 | 1270 | 1018 | -19% |
| 16384 | 1133 | 841 | -25% |
| 32768 | 1129 | 1021 | -9% |
| 65536 | 1067 | 880 | -17% |
| 131072 | 1145 | 861 | -24% |
| 262144 | 1125 | 982 | -12% |
| 524288 | 1199 | 937 | -21% |
| 1048576 | 1460 | 929 | -36% |
| 2097152 | 1220 | 1042 | -14% |
| 4194304 | 1431 | 1166 | -18% |
| 8388608 | 1885 | 1195 | -36% |
| 16777216 | 1798 | 1275 | -29% |
| 33554432 | 1485 | 1137 | -23% |
| 67108864 | 1373 | 1112 | -19% |
| 134217728 | 1338 | 1149 | -14% |
psqrt_complex: Fixed error handling where, unless otherwise specified, if either the real or imaginary component is nan, then the result is nan. This is addressed before handling the special infinity cases. Overall, it is slower.
https://godbolt.org/z/vneGGcGjc
| size | before | after | diff |
|---|---|---|---|
| 32 | 464 | 527 | 13% |
| 64 | 513 | 567 | 10% |
| 128 | 498 | 544 | 9% |
| 256 | 499 | 563 | 12% |
| 512 | 492 | 514 | 4% |
| 1024 | 605 | 636 | 5% |
| 2048 | 468 | 540 | 15% |
| 4096 | 551 | 612 | 11% |
| 8192 | 467 | 548 | 17% |
| 16384 | 510 | 612 | 20% |
| 32768 | 466 | 571 | 22% |
| 65536 | 517 | 542 | 4% |
| 131072 | 520 | 608 | 16% |
| 262144 | 520 | 537 | 3% |
| 524288 | 510 | 577 | 13% |
| 1048576 | 497 | 589 | 18% |
| 2097152 | 505 | 583 | 15% |
| 4194304 | 551 | 566 | 2% |
| 8388608 | 514 | 600 | 16% |
| 16777216 | 514 | 511 | 0% |
What does this implement/fix?
Additional information
Edited by Charles Schlosser