[x86 / Bug Fix] Fixed bug in TryCmpCMovOpts peephole optimization (!576) · Merge requests · FPC / FPC / FPC Source

J. Gareth "Kit" Moreton requested to merge CuriousKit/optimisations:pass2cmptest-fix into main Jan 26, 2024

Summary

This merge request fixes an oversight in the TryCmpCMovOpts routine (called by OptPass2CMP and OptPass2TEST) that caused bad code to be generated in a few circumstances.

System

Operating system: Windows (possibly Linux as well)
Processor architecture: i386 (possibly x86_64 as well)

What is the current bug behavior?

In some situations, the TryCmpCMovOpts would not stop where it was supposed to and end up erroneously moving unrelated MOV instructions and changing their destination registers.

What is the behavior after applying this patch?

The above circumstance should no longer happen and complementary CMOV pairs properly optimised.

Relevant logs and/or screenshots

The compiler bug notably manifested in the hlcgcpu unit, specifically the g_intf_wrapper procedure, although it appeared in other places too. After fixing and re-enabling the optimisation, the following improvement is made (i386-win32, -O4 -CpCOREAVX2) - before:

	...
	movl	$8,%ecx
	cmpb	$2,33(%eax)
# Peephole Optimization: CMOV Block (Branching type)
	cmoval	%ecx,%eax
	movl	$4,%ecx
# Peephole Optimization: CMOV Block (Simple type)
	cmovnal	%ecx,%eax
	pushl	$17039367
# Peephole Optimization: Lea2AddBase done
	addl	%edx,%eax
	pushl	%eax
	pushl	TC_$CGUTILS_$$_CTEMPPOSINVALID
	pushl	$4
	pushl	$0
	movl	-12(%ebx),%eax
	movl	U_$SYMDEF_$$_VOIDSTACKPOINTERTYPE,%ecx
	...

After:

	...
	movl	$8,%ecx
	cmpb	$2,33(%eax)
	movl	$4,%eax
# Peephole Optimization: CMOV Block (Branching type)
# Peephole Optimization: CMovCMov2MovCMov 1
	cmoval	%ecx,%eax
# Peephole Optimization: CMOV Block (Simple type)
	pushl	$17039367
# Peephole Optimization: Lea2AddBase done
	addl	%edx,%eax
	pushl	%eax
	pushl	TC_$CGUTILS_$$_CTEMPPOSINVALID
	pushl	$4
	pushl	$0
	movl	-12(%ebx),%eax
	movl	U_$SYMDEF_$$_VOIDSTACKPOINTERTYPE,%ecx
	...

When it was bugged, the movl U_$SYMDEF_$$_VOIDSTACKPOINTERTYPE,%ecx instruction and a couple of others were moved to before the comparison, and changed the destination register to %eax. Notably, the bug fix also ensures the MOV instruction is placed after the triggering CMP or TEST instruction if the new destination register appears in it.

Improvements also appear elsewhere. In the adler unit for example (i386-win32, -O4 -CpCOREAVX2) - before:

.Lj7:
	cmpl	$3854,%ecx
	cmovbw	%cx,%bx
	movw	$3854,%dx
	cmovnbw	%dx,%bx
	movswl	%bx,%edx
	subl	%edx,%ecx
	jmp	.Lj14

After (note that this time the MOV is placed before the comparison because it doesn't share any registers with it, thus minimising pipeline stalls):

.Lj7:
	movw	$3854,%bx
	cmpl	$3854,%ecx
	cmovbw	%cx,%bx
	movswl	%bx,%edx
	subl	%edx,%ecx
	jmp	.Lj14

The same optimisation is made under x86_64-win64 as well:

	...
.Lj7:
	cmpl	$3854,%r8d
	cmovbw	%r8w,%r11w
	movw	$3854,%dx
	cmovnbw	%dx,%r11w
	movswl	%r11w,%edx
	subl	%edx,%r8d
	jmp	.Lj14
	...

After:

	...
.Lj7:
	movw	$3854,%r11w
	cmpl	$3854,%r8d
	cmovbw	%r8w,%r11w
	movswl	%r11w,%edx
	subl	%edx,%r8d
	jmp	.Lj14
	...

Edited Jan 26, 2024 by J. Gareth "Kit" Moreton

[x86 / Bug Fix] Fixed bug in TryCmpCMovOpts peephole optimization

Summary

System

What is the current bug behavior?

What is the behavior after applying this patch?

Relevant logs and/or screenshots

Merge request reports