Avoid integer overflows in EigenMetaKernel indexing
- The current implementation computes
size + total_threads
, which can overflow and causeCUDA_ERROR_ILLEGAL_ADDRESS
when size is close to the maximum representable value. - The
num_blocks
calculation can also overflow due to the implementation ofdivup()
. - This patch prevents these overflows and allows the kernel to work correctly for the full representable range of tensor sizes.
- Also adds relevant tests.
cc @nluehr