Use more FMA in reciprocal iteration for precision
Description
Use more FMA in reciprocal iteration for precision
Typical Newton iteration for reciprocal is mul(x, fma(-a, x, 2)), which can also be written as fma(x, fma(-a, x, 1), x). Here two level of fma operations improves overall precision.
Reference issue
Additional information
Tested using https://gitlab.com/-/snippets/4903338, comparing the ratio of results 'nearer to' canonical 1/x result.
1 iteration (1000000000 random normal numbers and 10000000 denormal numbers):
(this means in the results of current method, 5.04% in normal numbers have better precision, while 20.73% with this change have better precision, and the rest, 74.23% are equal under both methods)
| Current Method | With This Change | |
|---|---|---|
| Normal Numbers | 5.04% | 20.73% |
| Denormal Numbers | 0% | 0% |
2 iterations:
| Current Method | With This Change | |
|---|---|---|
| Normal Numbers | 0% | 21.32% |
| Denormal Numbers | 0% | 0% |
1 iteration (using AVX512 rcp14 instruction):
| Current Method | With This Change | |
|---|---|---|
| Normal Numbers | 0.15% | 21.42% |
| Denormal Numbers | 0.11% | 16.16% |
2 iterations (using AVX512 rcp14 instruction):
| Current Method | With This Change | |
|---|---|---|
| Normal Numbers | 2e-07% | 17.24% |
| Denormal Numbers | 0% | 12.96% |
Edited by Chaofan Qiu