Avoid integer overflows in EigenMetaKernel indexing
- The current implementation computes
size + total_threads, which can overflow and causeCUDA_ERROR_ILLEGAL_ADDRESSwhen size is close to the maximum representable value. - The
num_blockscalculation can also overflow due to the implementation ofdivup(). - This patch prevents these overflows and allows the kernel to work correctly for the full representable range of tensor sizes.
- Also adds relevant tests.
cc @nluehr