Column-wise Lp1 sum resulting in aliased calculation when rows==2
Submitted by Alex S.
Assigned to Nobody
Link to original bugzilla bug (#1731)
Version: 3.3 (current stable)
Description
Created attachment 946
A counterexample.
Consider the following code (minimal example produced thanks to distinguished IRC user @ChriSopht, who happened to have enough free time before I did and was kind enough to use it for this):
https://godbolt.org/z/frbXeS
(in case Godbolt ever, god forbid, goes down, also attached).
The crux of it is the line:
arr.rowwise() /= arr.colwise().sum();
after which you'd expect each column to sum up to one (barring NaNs). Instead, part of the array will contain corrupted data.
The bug sources only when the number of rows is a. known at compile time b. exactly 2.
Further investigation shows that the first row of the array is normalized as expected, but the second seems to be divided by the updated sum. Therefore, it is likely an aliasing issue.
Increasing the number of rows supposedly buffs the complexity up enough to warrant a temporary, but I cannot easily test that hypothesis, as I'm not much of an assembly wizard.
It is clearly possible to encode this operation as a loop using O(1) memory, individually summing and dividing each column. Whether this is possible to account for with template expressions, I do not know. It is understandable if this is declared to be a clear aliasing problem, and decided to not be worth fixing, but I think at least this warrants a documentation warning, as the use-case is not obscure, and it also happens to work most of the time.
Best regards, Alex.
Attachment 946, "A counterexample.":
lp1-bug-counterexample.cpp