Fix CUDA `max_3` test and some subtle bugs (!47) · Merge requests · bandicoot-lib / bandicoot-code

Ryan Curtin requested to merge rcurtin/bandicoot-code:fix-max_3 into unstable Mar 08, 2023

The max_3 test in tests/max.cpp was known to be failing for the CUDA backend. Today I took a look into it, and uncovered some other little issues along the way:

Adapted max() and min() to use generic_reduce() (just a cleanup).
Found that generic_reduce() for CUDA was only handling one element per thread! Fixed.
Discovered some very subtle bugs in the max/min CUDA kernels that meant that the second element inspected by every thread would always be ignored.
Fixing those CUDA kernels required bumping the patch version.
Fixed a simple compilation bug for the expression max(max(abs(X))).
Seems like the accu benchmark program got modified somewhere along the way, so I reverted it back to what it was.

Fix CUDA `max_3` test and some subtle bugs

Merge request reports