Avoid integer overflow in EigenMetaKernel indexing (v2) (!713) · Merge requests · libeigen / eigen · GitLab

This is a re-submission of !681 (merged), which was reverted due to build issues on Windows.

This version has two changes compared to the previous version:

It doesn't use inline PTX, so there shouldn't be any build issues on Windows.
It only uses saturated addition in each loop iteration when overflow is possible (i.e., when the size is within total_threads of the max representable index). When overflow is not possible, regular addition is used.

Summary of changes:

The current implementation computes size + total_threads, which can overflow and cause CUDA_ERROR_ILLEGAL_ADDRESS when size is close to the maximum representable value.
The num_blocks calculation can also overflow due to the implementation of divup().
This patch prevents these overflows and allows the kernel to work correctly for the full representable range of tensor sizes.
Also adds relevant tests.