Skip to content

Avoid integer overflows in EigenMetaKernel indexing

  • The current implementation computes size + total_threads, which can overflow and cause CUDA_ERROR_ILLEGAL_ADDRESS when size is close to the maximum representable value.
  • The num_blocks calculation can also overflow due to the implementation of divup().
  • This patch prevents these overflows and allows the kernel to work correctly for the full representable range of tensor sizes.
  • Also adds relevant tests.

cc @nluehr

Merge request reports

Loading