The source project of this merge request has been removed.
Implement a generic vectorized version of Smith's algorithm for complex division.
Reference issue
Closes #2277 (closed)
What does this implement/fix?
The existing implementation of complex division is susceptible to under- and overflow. This change implements Smith's algorithm, which rescales the denominator to avoid such problems. The new implementation is about ~2x slower than the naive implementation, but makes Eigen pass the LAPACK linear solver test suite.
Additional information
Benchmark numbers for SSE:
name old cpu/op new cpu/op delta
BM_Div/1 18.9ns ± 0% 18.9ns ± 0% -0.02%
BM_Div/8 5.98ns ± 0% 11.76ns ± 1% +96.63%
BM_Div/64 40.4ns ± 0% 90.3ns ± 1% +123.32%
BM_Div/512 325ns ± 0% 728ns ± 1% +123.73%
BM_Div/4k 2.74µs ± 0% 5.82µs ± 1% +112.92%
BM_Div/32k 22.9µs ± 6% 46.5µs ± 1% +103.40%
BM_Div/256k 263µs ± 2% 385µs ± 1% +46.49%
BM_Div/1M 1.26ms ±12% 1.75ms ± 9% +38.05%
Edited by Rasmus Munk Larsen