Skip to content

x86: Bug fix to MovMov2Mov 6 optimisation exposed by 4012c3db

Summary

This merge request fixes a bug in the MovMov2Mov 6 optimisation that was exposed by the DeepMOVOpt Improvements (commit 4012c3db).

System

  • Operating system: Windows, Linux (bug observed on these platforms - others possible)
  • Processor architecture: x86_64 (likely i386 too)

What is the current bug behavior?

When Lazarus was built under -O2 and above from commit 4012c3db onwards, the application would instantly crash with an access violation, enter an infinite loop or other undefined behaviour. This was due to MovMov2Mov 6 being incorrectly applied when the register being evaluated was used inside a reference and whose presence wasn't checked - the result was the removal of a MOV instruction that left said register undefined.

What is the behavior after applying this patch?

Lazarus should successfully compile and run under -O2 and above without any problems.

Relevant logs and/or screenshots

Bugged assembly in the intfgraphics unit in Lazarus compilation under x86_64-win64 under -O4 -gl:

# Peephole Optimization: MovMov2MovMov1 done
	movq	%rcx,48(%rsp)
# Peephole Optimization: MovMov2Mov 6 done
.Ll591:
# [2004] FGetInternalColorProc := @GetColor_Generic;
	leaq	INTFGRAPHICS$_$TLAZINTFIMAGE_$__$$_GETCOLOR_GENERIC$LONGINT$LONGINT$TFPCOLOR(%rip),%rdx
# Peephole Optimization: %rax = %rcx; changed to minimise pipeline stall (MovXXX2MovXXX)
	movq	%rdx,232(%rcx)
	movq	%rcx,240(%rax) <-- %rax doesn't get replaced by %rcx and is left undefined when "movq %rcx,%rax" is removed (where the "MovMov2Mov 6" comment is)
.Ll592:
	movq	48(%rsp),%rax

Fixed (while still retaining good optimisation):

# Peephole Optimization: MovMov2MovMov1 done
	movq	%rcx,48(%rsp)
# Peephole Optimization: MovMov2Mov 6 done
.Ll591:
# [2004] FGetInternalColorProc := @GetColor_Generic;
	leaq	INTFGRAPHICS$_$TLAZINTFIMAGE_$__$$_GETCOLOR_GENERIC$LONGINT$LONGINT$TFPCOLOR(%rip),%rdx
# Peephole Optimization: %rax = %rcx; changed to minimise pipeline stall (MovXXX2MovXXX)
	movq	%rdx,232(%rcx)
# Peephole Optimization: %rax = %rcx; changed to minimise pipeline stall (MovMov2Mov 6a} <-- this is the missing debug message mentioned in "Additional Notes" below
	movq	%rcx,240(%rcx)
.Ll592:
	movq	48(%rsp),%rax

Additional Notes

  • There is a minor code refactor in the merge request that changes ReplaceRegisterInInstruction from a method to a static class function, since it only calls other static methods and doesn't touch the current object's state.
  • Another bug fix is applied in the same ball-park as "MovMov2Mov 6" where a taicpu(hp2).oper[1]^.reg = CurrentReg condition is checked without first checking if the operand is actually a register - normally it would just correctly return False but there's a very small chance that it's a reference whose address is equal to the register being checked against and causing undefined and difficult-to-reproduce behaviour. This has been fixed by changing the condition to MatchOperand(taicpu(hp2).oper[1]^, CurrentReg).
  • An additional "MovMov2Mov 6a" debug message is now exported to the assembly dumps if DEBUG_AOPTCPU is defined. This was mistakenly omitted if "MovMov2Mov 6" (the bugged optimisation, although "MovMov2Mov 6a" was partly responsible too) was also applied and made it more difficult to identify when a particular optimisation took place.

Merge request reports