Wrong offsets in array of strings of size 5 or 11 when peephole optimizer enabled for 80386 CPU

Summary

When accessing elements in an array of string[5] or string[11], byte offsets from the base pointer of the array are calculated wrong.

System Information

Operating system: Go32v2 and 32-bit Windows.
Processor architecture: x86
Compiler version: 3.x
Device: Computer

Steps to reproduce

Compile the example project. For Go32v2:

fpc -Tgo32v2 -O1 -Op80386 -Oopeephole test.pas

For 32-bit Windows:

fpc -Twin32 -O1 -Op80386 -Oopeephole test.pas

It is important to enable peephole optimization and set CPU to 80386.

Example project

const
  StrLen = 11 {5 or 11};
  Str: array [0..15] of string[StrLen] = ('0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '15');
var
  I: Integer;
  S: ^string;
begin
  WriteLn('@Str=', HexStr(PtrUInt(@Str), 8));
  for I := Low(Str) to High(Str) div 4 do
  begin
    {$IFDEF CPU16}
      asm jmp @1; nop; int3; @1: end;
    {$ELSE}
      asm jmp .L1; nop; int3; .L1: end;
    {$ENDIF}
    S := @Str[I];
    WriteLn('Str[', I, ']=@', HexStr(PtrUInt(S), 8), '=@Str+', PtrUInt(S) - PtrUInt(@Str), '=''', S^, '''');
  end;
end.

What is the current bug behavior?

For string[5] (array element size of 6 bytes), index is multiplied by 10 to get offset. For string[11] (12 bytes), index is multiplied by 36.

What is the expected (correct) behavior?

Index should be multipled by element size to get byte offset. Behavior is correct in FPC 2.6.4; also in FPC 3.x for i8086.

Relevant logs and/or screenshots

For string[5]:

@Str=004090D0
Str[0]=@004090D0=@Str+0='0'
Str[1]=@004090DA=@Str+10=' 2    3    4    5    6    '
Str[2]=@004090E4=@Str+20=''
Str[3]=@004090EE=@Str+30='5'

For string[11]:

@Str=004090D0
Str[0]=@004090D0=@Str+0='0'
Str[1]=@004090F4=@Str+36='3'
Str[2]=@00409118=@Str+72='6'
Str[3]=@0040913C=@Str+108='9'

Look for bytes 0x90 0xCC in the compiled code to see what happens. Multiplication of eax by 6 is converted to:

8D 04 00               lea   eax,[eax][eax]
8D 04 80               lea   eax,[eax][eax]*4

which is actually multiplication by 10. Multiplication of eax by 12 becomes:

8D 04 85 00 00 00 00   lea   eax,[eax]*4[0]
8D 04 C0               lea   eax,[eax][eax]*8

that is, multiplication by 36. In FPC 2.6.4, these compiled correctly to:

8D 04 40               lea   eax,[eax][eax]*2
01 C0                  add   eax,eax

and

8D 04 40               lea   eax,[eax][eax]*2
8D 04 85 00 00 00 00   lea   eax,[eax]*4[0]

Possible fixes

Probably, the peephole optimizer optimizes multiplications by 6 and 12 incorrectly. In file \fpc-3.2.2\compiler\i386\aoptcpu.pas of the source, the replacement seems to be correct; the multiplication of an integer variable by a constant of 6 or 12 is also correct. Maybe, array offset calculation is using another code path?

Edited Oct 14, 2022 by JoeForsterSTA

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information