[AVR] Interrupt skips to incorrect handler when raised after skipping instruction
Host environment
- Operating system: Windows 10
- Host architecture: x64
- QEMU flavor: qemu-system-avr
- QEMU version: 7.0.50 (v7.0.0-11902-g1d935f4a02-dirty)
Emulated/Virtualized environment
- Architecture: AVR, Bare metal
- GCC Version: avr-gcc.exe (GCC) 10.3.0
Description of problem
If interrupt is raised after instruction that can skip following instruction (for example CPSE
), and skip condition is active, instead of correct vector, one after it is executed.
This can happen only if CPSE instruction is at the end of translation block. Usually it is somewhere inside block and very rare arrangement of code is required to get into that error.
Steps to reproduce
Real world scenario is waiting in busy loop for std::atomic<bool>
set by interrupt, in bigger application, with optimized code and rare chance of code arrangement. Effect usually is landing in __bad_interrupt
and reset, but can also be executing other interrupt handler.
Synthetic example is:
-
There must be instruction that can skip following instruction (for example
CPSE
), with always-active condition for skip -
It must be placed in way, that it will be at the end of translation block.
Example (addresses matter):
ff8: 81 e0 ldi r24, 0x01 ; 1
ffa: 88 13 cpse r24, r24
ffc: 01 c0 rjmp .+2 ; 0x1000
ffe: 80 e0 ldi r24, 0x00 ; 0
1000: 00 00 nop
- It should be busy-looped to raise chances of encountering that code
- Any external interrupt should be generated
- the simplest is UART RX on stdin raised by key presses
Fully working example attached, with ELF file, annotated C code, ASM dump, and Makefile that allows compiling and running this scenario (but I don't guarantee that self-compiling would always generate this error - it can move code a bit).
(please adjust paths to GCC and QEMU in Makefile before using)
Running by command:
./qemu-system-avr -machine arduino-uno -nographic -monitor null -serial stdio -bios fail.elf
And then press any key until error happens.
It is largely machine independent, I originally encountered that on custom Atmega644 machine.
Possible solutions
When interrupt is raised by avr_cpu_do_interrupt
in target/avr/helper.c
and skip condition is active, it probably should either store that condition internally, clear it in CPU, and restore after interrupt ends, or evaluate env->pc_w
to be in correct address before it would be replaced by address of interrupt vector. Or skipping could be done in some other way in avr_tr_translate_insn
that it doesn't be postponed until next instruction.
I tried to patch that, but too deep knowledge of QEMU internals, code generation and AVR target is required to do this correctly.
Possible workaround for users
Add several nop
to move failing code out of critical page arrangement. I usually works fine and is hidden by internal logic of QEMU regarding when interrupt can be raised.
Additional information
Annotated execution log output of in_asm
, real-world example:
----------------
IN: _ZNKSt6atomicIbEcvbEv
0x00000ff4: MOVW r31:r30, r25:r24
0x00000ff6: LDDZ r25, Z+0
0x00000ff8: LDI r24, 1
0x00000ffa: CPSE r25, r1 // <-------------------- it must looks like that, with CPSE at the end
----------------
IN: _ZNKSt6atomicIbEcvbEv
0x00000ffc: RJMP .+2
----------------
IN: _ZNKSt6atomicIbEcvbEv
0x00001000: RET
...
and then:
// <-------------------- INT 20 raised
...
----------------
IN:
0x00000050: JMP 0x1002 // <-- correct vector loaded...
----------------
IN:
0x00000054: JMP 0x1012 // <-- ...but skipping to one after that...
----------------
IN: __vector_21 // <-- ...and executing incorrect handler
...