Skip to content

Add more eOps (misc. elementwise functions and trigonometric functions)

Ryan Curtin requested to merge rcurtin/bandicoot-code:add-more-eops into unstable

I went through and finished implementations of a whole boatload of elementwise utility functions. They are pretty boilerplate for now, but they are tested and they work. Future improvements may be possible.

  • exp2()

  • exp10()

  • trunc_exp()

  • log2()

  • log10()

  • trunc_log()

  • pow()

  • floor()

  • ceil()

  • round()

  • trunc()

  • sign()

  • erf()

  • erfc()

  • lgamma()

  • cos()

  • sin()

  • tan()

  • acos()

  • asin()

  • atan()

  • cosh()

  • sinh()

  • tanh()

  • acosh()

  • asinh()

  • atanh()

  • sinc()

  • atan2()

  • hypot()

Potentially one problem that may need to be solved later is that small implementational differences may cause differing results between the CPU and the GPU. For instance, if you have a Mat<u32> called X whose first element is 5, and you want to do pow(X, 2), this will work: on the GPU, X's elements will be upcasted to the appropriate same-size floating point type (float), pow(5.0f, 2.0f) will get called, and then the result will be cast back to a u32. However, on the GPU, pow(5.0f, 2.0f) produces a value just a tiny bit smaller than 25, so the u32 result that you get is 24! This does not occur on the CPU, and is the result of small implementational differences. (The error is within the bounds specified for CUDA's mathematical functions...)

I am sure pow() is not the only function where this type of thing might occur. I want to avoid casting to a higher-precision floating-point type (like double), because often the user may be intentionally using short types. Anyway, it is something to come back to later. The initial implementation is fine.

I'll let this sit for a handful of days and then merge it---as always, don't feel obligated to give a big, deep review; the amount of code here is intimidating (yet mostly boilerplate), and the MR is meant as more of an FYI anyway.

Merge request reports