x86: Fix to incorrect range handling for ADD/SUB 128 optimisation (!90) · Merge requests · FPC / FPC / FPC Source

J. Gareth "Kit" Moreton requested to merge CuriousKit/optimisations:carry-fix into main Nov 30, 2021

Summary

This patch corrects a bug with !86 (merged) that tends to appear on i386, not properly tracking the flags on the immediate next instruction. Code generation is also improved so any ADC or SBB instruction that follows is also modified to handle the flipped carry bit.

System

Operating system: Windows 32-bit, Linux 32-bit
Processor architecture: i386, possibly x86_64 in rare circumstances

What is the current bug behavior?

On i386, the overflow on adding or subtracting 128 to an Int64 is not properly handled, and some comparisons are incorrect. This caused the test "toverflow1b" to fail in multiple places when run under i386.

What is the behavior after applying this patch?

The "toverflow1b" test should now pass and the optimisation properly and safely applied on i386.

Relevant logs and/or screenshots

In the toverflow1b i386-win32 assembly dump - before:

.Lj11:
	xorl	%eax,%eax
	pushl	$.La1
	pushl	%ebp
	pushl	$__FPC_on_handler
	pushl	%fs:(%eax)
	movl	%esp,%fs:(%eax)
	cmpb	$0,-8(%ebp)
	je	.Lj18
	movl	-4(%ebp),%eax
	# Register edx allocated
	xorl	%edx,%edx
	addl	$-128,%eax
	sbbl	$0,%edx ; <-- The carry flag has flipped meaning, but the instruction isn't modified to compensate for this
	testl	%edx,%edx
	je	.Lj19
	call	fpc_rangeerror
.Lj19:

After:

.Lj11:
	xorl	%eax,%eax
	pushl	$.La1
	pushl	%ebp
	pushl	$__FPC_on_handler
	pushl	%fs:(%eax)
	movl	%esp,%fs:(%eax)
	cmpb	$0,-8(%ebp)
	je	.Lj18
	movl	-4(%ebp),%eax
	xorl	%edx,%edx
	addl	$-128,%eax
	adcl	$-1,%edx ; <-- Instruction modified to handle the carry bit inversion
	testl	%edx,%edx
	je	.Lj19
	call	fpc_rangeerror
.Lj19:

Additional Notes

The only actual bug fix is changing line 10670 from "TmpUsedRegs[R_SPECIALREGISTER].Update(tai(hp1.Next), True);" to "TmpUsedRegs[R_SPECIALREGISTER].Update(tai(p.Next), True);" - having just this will simply cause the optimisation to not be applied if ADC or SBB instructions are present (which is fine - it fixes the bug as is!). The rest of the code improves the optimisation by changing the ADC and SBB instructions so it can be applied (it also modifies a debug message slightly to include the instruction size suffix).

Edited Nov 30, 2021 by J. Gareth "Kit" Moreton

x86: Fix to incorrect range handling for ADD/SUB 128 optimisation