FR: compile "n - n mod CONST" as "n div CONST * CONST" for all positive CONSTs.
In general, the latter is easier and uses one less register.
#39612 shows that this operation occurs frequently when dealing with vectorized batch operations and their tails, and can be subjected to further multiplications (implicitly by `sizeof(single)` in this case) that easily fold with `n div CONST * CONST` but not `n - n mod CONST`.
When `CONST` is not a power of two:
- For signed `n`, my trunk compiler uses `IDIV` for `n - n mod CONST` (on `x86-32/win32` as well as `x86-64/win64`) but not for `n div CONST * CONST`.
- For unsigned `n`, `n div CONST * CONST` saves around three `x86` instructions.
When `CONST` is a power of two:
- For signed `n`, `n div CONST * CONST` saves around two `x86` instructions.
- For unsigned `n`, the compiler already optimizes `n div CONST * CONST`, but not `n - n mod CONST`, into `n and not NType(CONST - 1)`.
Test with switchable divisor and signedness:
```pascal
{-$define unsigned}
const
Divisor = 17;
TestRange = 3 * Divisor;
var
i, x, r1, r2: {$ifdef unsigned} uint32 {$else} int32 {$endif};
ok: boolean;
begin
ok := true;
for i := {$ifdef unsigned} 0 {$else} -2 * TestRange {$endif} to 2 * TestRange do
begin
{$ifndef unsigned}
if i < -TestRange then
x := Low(x) + (i - (-2 * TestRange)) // test [Low(x); Low(x) + TestRange)
else
{$endif}
if i <= TestRange then x := i // test [-TestRange; TestRange] or [0; TestRange]
else x := High(x) - (i - TestRange - 1); // test (High(x) - TestRange; High(x)]
r1 := x - x mod Divisor;
r2 := x div Divisor * Divisor;
if r1 <> r2 then
begin
writeln('FAIL: x=', x, ', x - x mod ', Divisor, ' = ', r1, ', x div ', Divisor, ' * ', Divisor, ' = ', r2);
ok := false;
end;
end;
if ok then writeln('ok');
end.
```
issue