Skip to content

[x86] Bug fix and better coverage for DoSubAddOpt optimisation method

Summary

This merge request primarily fixes a bug in the DoSubAddOpt method that seeks to merge an arithmetic operation on a register with one that appeared right before it (merging with the last instruction rather than the next instruction permits better optimisation cascading). The bug in question is that overflow conditions weren't checked, leading to assembler errors in rare cases (seemingly exclusive to x86_64).

Nevertheless, since it barely added anything to the compiler's code size, it is now also called at the end of OptPass1ADD as well as OptPass1SUB.

More specific changes:

  • Bug fix where overflow conditions weren't checked, leading to assembler errors in rare cases.
  • Renamed from DoSubAddOpt to DoArithCombineOpt.
  • Can now handle ADD instructions as the root, not just SUB, and is now called at the end of OptPass1ADD.
  • Can now handle INC as a previous instruction (could already handle DEC).
  • Debug messages have now been added.

System

  • Processor architecture: i386, x86_64, possibly i8086

What is the current bug behavior?

In very rare situations (triggered while working on another optimisation), an assembler error can be triggered if the combined constant falls outside of the range that the instruction can handle... specifically, 64-bit ADD and SUB instructions can only handle signed 32-bit immediates.

What is the behavior after applying this patch?

Assembler errors should no longer occur, and some very minor optimisation improvement.

Relevant logs and/or screenshots

The bug has only currently been triggered during development of another optimisation, but can probably be triggered in contrived situations.

A couple of minor optimisations were observed: one in the compiler and one in the pasjpeg package.

For the package, in packages/pasjpeg/src/jdsample.pas (-O4 under x86_64-win64) - before:

.Lj61:
	...
	movl	40(%rdx),%r12d
	subl	$3,%r12d
	addl	$1,%r12d
	...
.Lj64:

After:

.Lj61:
	...
	movl	40(%rdx),%r12d
	addl	$-2,%r12d
	...
.Lj64:

add $-2,%r12d could probably be changed to sub $2,%r12d if there's a logical gain, but care has to be taken because a number of flags undergo inversion when ADD and SUB are switched.

For the compiler, in compiler/rgobj.pas - before:

.Lj712:
	...
	movb	%dil,%al
	subb	$1,%al
	testb	%al,%al
	movb	%dil,%al
	subb	$1,%al
	addb	$1,%al
	movb	%al,%dil
	...
.Lj720:
	...

The SUB and ADD instructions cancel each other out, and then OptPass1MOV is able to cancel out the two MOV instructions - after:

.Lj712:
	...
	movb	%dil,%al
	subb	$1,%al
	testb	%al,%al
	...
.Lj720:
	...

the TEST instruction doesn't seem to serve a purpose, so a future optimisation should be researched in order to remove it - PostPeepholeOptTestOr can't yet handle TEST instructions if a conditional instruction (SETcc, Jcc or CMOVcc) doesn't follow.

Merge request reports