FR: compile "n - n mod CONST" as "n div CONST * CONST" for all positive CONSTs.
In general, the latter is easier and uses one less register. #39612 shows that this operation occurs frequently when dealing with vectorized batch operations and their tails, and can be subjected to further multiplications (implicitly by `sizeof(single)` in this case) that easily fold with `n div CONST * CONST` but not `n - n mod CONST`. When `CONST` is not a power of two: - For signed `n`, my trunk compiler uses `IDIV` for `n - n mod CONST` (on `x86-32/win32` as well as `x86-64/win64`) but not for `n div CONST * CONST`. - For unsigned `n`, `n div CONST * CONST` saves around three `x86` instructions. When `CONST` is a power of two: - For signed `n`, `n div CONST * CONST` saves around two `x86` instructions. - For unsigned `n`, the compiler already optimizes `n div CONST * CONST`, but not `n - n mod CONST`, into `n and not NType(CONST - 1)`. Test with switchable divisor and signedness: ```pascal {-$define unsigned} const Divisor = 17; TestRange = 3 * Divisor; var i, x, r1, r2: {$ifdef unsigned} uint32 {$else} int32 {$endif}; ok: boolean; begin ok := true; for i := {$ifdef unsigned} 0 {$else} -2 * TestRange {$endif} to 2 * TestRange do begin {$ifndef unsigned} if i < -TestRange then x := Low(x) + (i - (-2 * TestRange)) // test [Low(x); Low(x) + TestRange) else {$endif} if i <= TestRange then x := i // test [-TestRange; TestRange] or [0; TestRange] else x := High(x) - (i - TestRange - 1); // test (High(x) - TestRange; High(x)] r1 := x - x mod Divisor; r2 := x div Divisor * Divisor; if r1 <> r2 then begin writeln('FAIL: x=', x, ', x - x mod ', Divisor, ' = ', r1, ', x div ', Divisor, ' * ', Divisor, ' = ', r2); ok := false; end; end; if ok then writeln('ok'); end. ```
issue