x86: Bug fix to MovMov2Mov 6 optimisation exposed by 4012c3db
Summary
This merge request fixes a bug in the MovMov2Mov 6 optimisation that was exposed by the DeepMOVOpt Improvements (commit 4012c3db).
System
- Operating system: Windows, Linux (bug observed on these platforms - others possible)
- Processor architecture: x86_64 (likely i386 too)
What is the current bug behavior?
When Lazarus was built under -O2 and above from commit 4012c3db onwards, the application would instantly crash with an access violation, enter an infinite loop or other undefined behaviour. This was due to MovMov2Mov 6 being incorrectly applied when the register being evaluated was used inside a reference and whose presence wasn't checked - the result was the removal of a MOV instruction that left said register undefined.
What is the behavior after applying this patch?
Lazarus should successfully compile and run under -O2 and above without any problems.
Relevant logs and/or screenshots
Bugged assembly in the intfgraphics unit in Lazarus compilation under x86_64-win64 under -O4 -gl:
# Peephole Optimization: MovMov2MovMov1 done
movq %rcx,48(%rsp)
# Peephole Optimization: MovMov2Mov 6 done
.Ll591:
# [2004] FGetInternalColorProc := @GetColor_Generic;
leaq INTFGRAPHICS$_$TLAZINTFIMAGE_$__$$_GETCOLOR_GENERIC$LONGINT$LONGINT$TFPCOLOR(%rip),%rdx
# Peephole Optimization: %rax = %rcx; changed to minimise pipeline stall (MovXXX2MovXXX)
movq %rdx,232(%rcx)
movq %rcx,240(%rax) <-- %rax doesn't get replaced by %rcx and is left undefined when "movq %rcx,%rax" is removed (where the "MovMov2Mov 6" comment is)
.Ll592:
movq 48(%rsp),%rax
Fixed (while still retaining good optimisation):
# Peephole Optimization: MovMov2MovMov1 done
movq %rcx,48(%rsp)
# Peephole Optimization: MovMov2Mov 6 done
.Ll591:
# [2004] FGetInternalColorProc := @GetColor_Generic;
leaq INTFGRAPHICS$_$TLAZINTFIMAGE_$__$$_GETCOLOR_GENERIC$LONGINT$LONGINT$TFPCOLOR(%rip),%rdx
# Peephole Optimization: %rax = %rcx; changed to minimise pipeline stall (MovXXX2MovXXX)
movq %rdx,232(%rcx)
# Peephole Optimization: %rax = %rcx; changed to minimise pipeline stall (MovMov2Mov 6a} <-- this is the missing debug message mentioned in "Additional Notes" below
movq %rcx,240(%rcx)
.Ll592:
movq 48(%rsp),%rax
Additional Notes
- There is a minor code refactor in the merge request that changes ReplaceRegisterInInstruction from a method to a static class function, since it only calls other static methods and doesn't touch the current object's state.
- Another bug fix is applied in the same ball-park as "MovMov2Mov 6" where a
taicpu(hp2).oper[1]^.reg = CurrentReg
condition is checked without first checking if the operand is actually a register - normally it would just correctly return False but there's a very small chance that it's a reference whose address is equal to the register being checked against and causing undefined and difficult-to-reproduce behaviour. This has been fixed by changing the condition toMatchOperand(taicpu(hp2).oper[1]^, CurrentReg)
. - An additional "MovMov2Mov 6a" debug message is now exported to the assembly dumps if DEBUG_AOPTCPU is defined. This was mistakenly omitted if "MovMov2Mov 6" (the bugged optimisation, although "MovMov2Mov 6a" was partly responsible too) was also applied and made it more difficult to identify when a particular optimisation took place.