Skip to content

Fix CUDA `max_3` test and some subtle bugs

Ryan Curtin requested to merge rcurtin/bandicoot-code:fix-max_3 into unstable

The max_3 test in tests/max.cpp was known to be failing for the CUDA backend. Today I took a look into it, and uncovered some other little issues along the way:

  • Adapted max() and min() to use generic_reduce() (just a cleanup).
  • Found that generic_reduce() for CUDA was only handling one element per thread! Fixed.
  • Discovered some very subtle bugs in the max/min CUDA kernels that meant that the second element inspected by every thread would always be ignored.
  • Fixing those CUDA kernels required bumping the patch version.
  • Fixed a simple compilation bug for the expression max(max(abs(X))).
  • Seems like the accu benchmark program got modified somewhere along the way, so I reverted it back to what it was.

Merge request reports