[AVR] Incorrect code generated under -O3 for parameter passed by reference
## Summary
16 bit sized subroutine parameters passed by reference gets clobbered under certain conditions when optimization level 3 or higher is active.
## System Information
- **Operating system:** embedded
- **Processor architecture:** AVR
- **Compiler version:** 3.3.1 (11cf24891dee700e81bbb9eeca61f867502f6d7b), also 3.2.2
- **Device:** Microcontroller
## Steps to reproduce
Compile and test example project:
## Example Project
```
program testsimavr;
var
w, r: word;
procedure func(var a: word; var r: word);
begin
r := a;
dec(a);
inc(r);
end;
begin
w := 1234;
func(w, r);
writeln('w = ', w, ' (expected w = 1233)');
writeln('r = ', r, ' (expected r = 1235)');
end.
```
## What is the current bug behavior?
```
$ ~/fpc/gitlab/compiler/avr/pp -n @~/fpc/gitlab/fpc.cfg -Wpavrsim -al -O3 testsimavr.pp
$ ~/LazProjs/fp-avrsim-cc/avrsim testsimavr.bin
w = 978 (expected w = 1233)
r = 1235 (expected r = 1235)
````
## What is the expected (correct) behavior?
```
$ ~/fpc/gitlab/compiler/avr/pp -n @~/fpc/gitlab/fpc.cfg -Wpavrsim -al -O2 testsimavr.pp
$ ~/LazProjs/fp-avrsim-cc/avrsim testsimavr.bin
w = 1233 (expected w = 1233)
r = 1235 (expected r = 1235)
```
## Relevant logs and/or screenshots
The bug manifests in procedure func, on the line `dec(a)` - specifically when loading the value of variable `a` from its reference:
```
# [8] r := a;
movw r30,r24
movw r26,r22
ld r0,Z+
st X+,r0
ld r0,Z
st X,r0
.Ll2:
# [9] dec(a);
ld r18,Z
ldd r19,Z+1
```
The bug is that the instruction `ld r0, Z+` modifies Z, so Z should be reloaded with the reference to `a` before the `ld r18, Z` instruction. With -O2 this does happen:
```
# [8] r := a;
movw r30,r24
movw r26,r22
ld r0,Z+
st X+,r0
ld r0,Z
st X,r0
.Ll2:
# [9] dec(a);
movw r30,r24
ld r18,Z
ldd r19,Z+1
```
Since the correct instruction sequence is generated at lower optimization levels it suggests that some optimization is responsible for this bug. It is possible that this optimization check misses that `ld r0, Z+` modifies Z and therefore eliminates the required `movw r30, r24`. Unfortunately the specific optimization responsible for this is not reported by -dDEBUG_AOPTCPU.
issue