Add vector `norm()` for non-subviews (!40) · Merge requests · bandicoot-lib / bandicoot-code

Ryan Curtin requested to merge rcurtin/bandicoot-code:norm into unstable Feb 02, 2023

This one quickly got out of hand.

I thought I could just use cuBLAS and clBLAS to implement norm(), but there are a multitude of different norm types (1/2/k/min/max), they apply to vectors and matrices differently, and cuBLAS and clBLAS don't support everything. In fact, clBLAS's implementation needs so much auxiliary space (2n for a vector of length n!) that I chose to just implement my own kernel for it.

A bunch of other things happened too:

I refactored accu(), min(), max(), and max_abs() for each backend into a function called generic_reduce(), because they all use the same general strategy but different kernels.
I found a bug in all the reduces implemented in OpenCL (I used get_global_size(0) instead of get_local_size(0)); this fixes some of the accu/min/max tests that were failing only on the OpenCL backend.
I implemented kernels for norm_1, norm_k, norm_2 (OpenCL only), norm_2_robust (OpenCL only), and norm_min.
For norm_max we can just use max_abs().

This MR does not cover the following things, but I'll handle them in future MRs:

Matrix norms.
Norms on subviews. (Those will require separate kernels.)

I'll merge this in a couple days in case there are any comments.

Add vector `norm()` for non-subviews

Merge request reports