Improve accuracy of full tensor reduction for half and bfloat16 (!676) · Merge requests · libeigen / eigen

We use a tree summation algorithm for full tensor reduction. The relative error in summing n (positive) elements this way is bounded by ~2*eps*(log(n/B) + B), where B is the size of the leaves in the tree, where we sum sequentially in the interest of speed. For less accurate types (i.e. types with larger eps), we reduce B to keep the relative error significantly below 1.

Edited Oct 20, 2021 by Rasmus Munk Larsen

Improve accuracy of full tensor reduction for half and bfloat16

Merge request reports