Consolidate CUDA cub method calls
Change the getExclusiveScanWorkingArraySize method to work with both the CUDA CUB method calls.
Will also work later for HIP rocprim and the SYCL equivalent
Change the getExclusiveScanWorkingArraySize method to work with both the CUDA CUB method calls.
Will also work later for HIP rocprim and the SYCL equivalent