Skip to content

[x86 / Bug Fix] Correction and Improvement to TEST/JNE/TEST/JNE peephole optimization (Fixes #40366)

Summary

This merge request fixes a bug in the TEST/JNE/TEST/JNE peephole optimisation that caused certain conditions to be incorrectly reordered or optimised out, thereby causing the wrong branches to be taken in some circumstances.

There are three commits:

  • The first commit is a new test that demonstrates the bug (it's the same one as in #40366 (closed)).
  • The second commit fixes the problem.
  • The third commit makes the TEST/JNE/TEST/JNE optimisation slightly smarter by removing the first jump rather than the second jump if possible. That way, if there are any register deallocations between the second TEST/JNE pair, their position before the conditional jump are preserved, thereby improving future optimisations (it doesn't look like their values need to be preserved through the jump).

System

  • Processor architecture: i386, x86_64

What is the current bug behavior?

As explained in #40366 (closed) (and demonstrated in the new test), certain combinations of conditions can be incorrectly optimised under -O3 and cause the wrong code paths to be taken. The compiler itself is affected by this problem.

What is the behavior after applying this patch?

The faulty optimisation is fixed and also improved slightly (fixes #40366 (closed)).

Relevant logs and/or screenshots

For the bug in tcgprocinfo.set_eh_info in the psub unit (x86_64-win64, -O4) - before (non-jump related labels removed):

.section .text.n_psub$_$tcgprocinfo_$__$$_set_eh_info,"ax"
	.balign 16,0x90
.globl	PSUB$_$TCGPROCINFO_$__$$_SET_EH_INFO
PSUB$_$TCGPROCINFO_$__$$_SET_EH_INFO:
.seh_proc PSUB$_$TCGPROCINFO_$__$$_SET_EH_INFO
	pushq	%rbx
.seh_pushreg %rbx
	leaq	-32(%rsp),%rsp
.seh_stackalloc 32
.seh_endprologue
	movq	%rcx,%rbx
	call	PROCINFO$_$TPROCINFO_$__$$_SET_EH_INFO
	testb	$16,U_$SYSTEMS_$$_TARGET_INFO+60(%rip)
	je	.Lj424
	movl	256(%rbx),%eax
	testl	$20,%eax
	jne	.Lj425
.Lj424:
	...

After - the check of cs_implicit_exceptions in current_settings.moduleswitches is retained and the two bit-tests of flags (stored in %eax during the conditional checks) are not merged (one is convered to BT at a later stage... this itself could use some minor enhancement at a later date):

.section .text.n_psub$_$tcgprocinfo_$__$$_set_eh_info,"ax"
	.balign 16,0x90
.globl	PSUB$_$TCGPROCINFO_$__$$_SET_EH_INFO
PSUB$_$TCGPROCINFO_$__$$_SET_EH_INFO:
.Lc170:
.seh_proc PSUB$_$TCGPROCINFO_$__$$_SET_EH_INFO
	pushq	%rbx
.seh_pushreg %rbx
.Lc171:
	leaq	-32(%rsp),%rsp
.Lc172:
.seh_stackalloc 32
.seh_endprologue
	movq	%rcx,%rbx
	call	PROCINFO$_$TPROCINFO_$__$$_SET_EH_INFO
	testb	$16,U_$SYSTEMS_$$_TARGET_INFO+60(%rip)
	je	.Lj424
	movl	256(%rbx),%eax
	testl	$4,%eax
	jne	.Lj425
	testl	$1024,U_$GLOBALS_$$_CURRENT_SETTINGS+72(%rip)
	je	.Lj424
	btl	$4,%eax
	jc	.Lj425
.Lj424:
	...

For the third commit that improves the optimisation, it mostly allows the conversion of mov ref,%reg; test x,%reg and similar to test x,ref if the register isn't used afterwards. Such as in ncal - before:

.Lj360:
	movq	176(%rbx),%rax
	cmpb	$0,354(%rax)
	je	.Lj368
	leaq	U_$GLOBALS_$$_CURRENT_SETTINGS(%rip),%rax
	testb	$192,89(%rax)
	je	.Lj371

After:

.Lj360:
	movq	176(%rbx),%rax
	cmpb	$0,354(%rax)
	je	.Lj368
	testb	$192,U_$GLOBALS_$$_CURRENT_SETTINGS+89(%rip)
	je	.Lj371

In ngenutil, a cascade of optimisations occurs - before:

.section .text.n_ngenutil$_$tnodeutils_$__$$_local_varsyms_finalize$tobject$pointer,"ax"
	.balign 16,0x90
.globl	NGENUTIL$_$TNODEUTILS_$__$$_LOCAL_VARSYMS_FINALIZE$TOBJECT$POINTER
NGENUTIL$_$TNODEUTILS_$__$$_LOCAL_VARSYMS_FINALIZE$TOBJECT$POINTER:
	...
	cmpb	$2,44(%rdx)
	jne	.Lj100
	cmpl	$0,76(%rdx)
	jng	.Lj100
	movl	104(%rdx),%eax
	testl	$258,%eax
	jne	.Lj100
	testl	$268435456,104(%rdx)
	jne	.Lj100
	movq	96(%rdx),%rcx
	call	DEFUTIL_$$_IS_MANAGED_TYPE$TDEF$$BOOLEAN
	testb	%al,%al
	je	.Lj100

After - the MOV/TEST pair is simplified into a TEST, and then another TEST/JNE/TEST/JNE optimisation merges it with a TEST instruction below it that evaluates the same reference:

.section .text.n_ngenutil$_$tnodeutils_$__$$_local_varsyms_finalize$tobject$pointer,"ax"
	.balign 16,0x90
.globl	NGENUTIL$_$TNODEUTILS_$__$$_LOCAL_VARSYMS_FINALIZE$TOBJECT$POINTER
NGENUTIL$_$TNODEUTILS_$__$$_LOCAL_VARSYMS_FINALIZE$TOBJECT$POINTER:
	...
	cmpb	$2,44(%rdx)
	jne	.Lj100
	cmpl	$0,76(%rdx)
	jng	.Lj100
	testl	$268435714,104(%rdx)
	jne	.Lj100
	movq	96(%rdx),%rcx
	call	DEFUTIL_$$_IS_MANAGED_TYPE$TDEF$$BOOLEAN
	testb	%al,%al
	je	.Lj100
Edited by J. Gareth "Kit" Moreton

Merge request reports