Fix TensorUInt128 division infinite loop on overflow
Summary
The TensorUInt128 division operator doubles d in a loop to find the largest power-of-2 multiple of rhs that fits in lhs. When d overflows 128 bits (wraps around), the loop condition lhs >= d never becomes false, causing an infinite loop.
This happens whenever lhs > 2^127, as reported in #3012 (closed).
Fix
Add a single overflow check: before doubling d, test if its high bit is set (d.high >> 63). If so, break — the next doubling would overflow. The subsequent binary long-division loop handles this correctly.
Testing
- Added
test_div_overflow()regression test with the exact reproduction case from #3012 (closed) - Additional edge cases:
UINT128_MAX / 1,UINT128_MAX / 3 - All 7 existing subtests pass (GCC 15.2.0, C++20)
Closes #3012 (closed)