[ARM / Bug Fix] Fixed incorrectly-encoded ADR instructions (fixes #40472)
Summary
This merge request corrects an issue with the internal ARM assembler that caused inline ADR
instructions to be incorrectly encoded if their calculated offsets were greater than or equal to 256 (see issue #40472 (closed)). This is because those ADR
instructions were internally mapped onto equivalent ADD
instructions with the program counter, and uses a shifter-immediate operand. The internal assembler (the external assembler that's used with the -a
option doesn't suffer from this problem) would instead write the unmodified offset over the bits for both the shifter and the mantissa, resulting in a completely different value being encoded (e.g. 528 (bit pattern 0010 0001 0000
) would become 16 ror 4 = 1).
System
- Operating system: Linux (Raspberry Pi OS)
- Processor architecture: ARM (HF/v6, likely earlier too)
- Device: Raspberry Pi 400
What is the current bug behavior?
- Inline ARM assembly language that uses
ADR
instructions may get incorrectly encoded.
What is the behavior after applying this patch?
- Such
ADR
offsets will now be converted to equivalent shifter-immediate values, or throw an assembler error if it proves to not be possible.
Relevant logs and/or screenshots
The new test at tests/webtbs/tw40472.pp
showcases the problem in question:
program tw40472;
function AddrCheck(): LongInt; assembler; nostackframe;
asm
ADR R0, .L528Ahead
LDR R0, [R0]
BX LR
.LPadding:
.long 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
.long 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
.long 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
.long 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
.long 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
.long 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
.long 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
.long 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
.long 0, 0, 0
.L528Ahead:
.long 0x5555AAAA
end;
var
Output: LongInt;
begin
Output := AddrCheck();
if Output <> $5555AAAA then
begin
WriteLn('ERROR: Expected $5555AAAA but got $', HexStr(Output, 8));
Halt(1);
end;
WriteLn('ok');
end.
On the trunk, after the ADR
call, R0 points to part way inside the BX LR
instruction and, after being dereferenced by LDR
, returns $00E12FFF
.
Additional notes
- The fix is a little bit hacky as it specifically singles out the
ADR
instruction during pass 2 of the assembler/object builder and maybe goes against the NASM-inspired coding design, so a later refactor may be required (e.g. changing the codes incompile/arm/armins.dat
). - I confess I don't yet fully understand all of the different Thumb and ARM encoding modes and if they will be negatively affected, so some additional testing may be required to confirm correctness.