Adding PocketFFT support in FFT module since kissfft has some flaw in accuracy and performance
Eigen using KissFFT as the default fft implementation now,but it`s performance drops sharply in some specific situations since kissfft fail to handle "odd-sized" inputs (i.e., sizes with large factors) efficiently, which was proposed in #1717.In pocketfft, for lengths with very large prime factors, Bluestein's algorithm is used, and instead of an FFT of length n, a convolution of length n2 >= 2*n-1
is performed,where n is chosen to be highly composite.
In addition, the google/jax project uses eigen's fft as its backend before, which also has problems with performance and accuracy. You can see their discussion FFT precision/performance #2952. Currently, they have switched to pocketfft as their default fft implementation.
Prelininary performance comparison(complex to complex and -O3 optimization ):
-------------------------------------------
length kissfft pocketfft
-------------------------------------------
100000 7.03 ms 4.36 ms
100000*2 13.7 ms 9.67 ms
100001 14999 ms 24.3 ms
Updated benchmarks tested by fft_benchmark.cpp on AArch64 (time units:ns) :
Benchmark Time CPU Time Old Time New CPU Old CPU New
-------------------------------------------------------------------------------------------------------------------------
test_scalar_float/100 -0.1687 -0.1688 2555 2124 2554 2123
test_scalar_float/512 -0.4176 -0.4176 14773 8603 14767 8600
test_scalar_float/4096 -0.5413 -0.5413 158755 72818 158684 72781
test_scalar_float/32768 -0.4713 -0.4714 1639528 866753 1638255 866018
test_scalar_float/100000 -0.5012 -0.5012 6823375 3403716 6817207 3400439
test_scalar_float/100001 -0.9988 -0.9988 22048197988 25454072 22029408670 25420215
test_scalar_double/100 -0.1298 -0.1298 2421 2107 2420 2106
test_scalar_double/512 -0.3835 -0.3835 13892 8565 13885 8561
test_scalar_double/4096 -0.5060 -0.5060 153078 75624 153005 75581
test_scalar_double/32768 -0.4349 -0.4350 1912198 1080593 1910733 1079557
test_scalar_double/100000 -0.4256 -0.4256 7089696 4072526 7083264 4068571
test_scalar_double/100001 -0.9985 -0.9985 29955156310 44809112 29930479990 44739807
test_complex_float/100 -0.4638 -0.4638 4454 2388 4452 2387
test_complex_float/512 -0.6285 -0.6285 29882 11102 29869 11097
test_complex_float/4096 -0.6174 -0.6174 285366 109192 285238 109137
test_complex_float/32768 -0.6265 -0.6265 3698800 1381495 3695956 1380318
test_complex_float/100000 -0.6730 -0.6730 13653536 4465063 13642033 4460459
test_complex_float/100001 -0.9990 -0.9990 22388960173 22772174 22370358320 22745811
test_complex_double/100 -0.1601 -0.1602 4239 3560 4237 3558
test_complex_double/512 -0.6085 -0.6085 28463 11143 28452 11138
test_complex_double/4096 -0.5924 -0.5924 294522 120061 294352 119977
test_complex_double/32768 -0.5963 -0.5963 4087268 1650075 4083765 1648615
test_complex_double/100000 -0.5011 -0.5011 15675452 7820870 15659329 7811764
test_complex_double/100001 -0.9985 -0.9985 28986634903 43583605 28961759140 43521986