[x86 / Bug Fix] Correction and Improvement to TEST/JNE/TEST/JNE peephole optimization (Fixes #40366)
Summary
This merge request fixes a bug in the TEST/JNE/TEST/JNE
peephole optimisation that caused certain conditions to be incorrectly reordered or optimised out, thereby causing the wrong branches to be taken in some circumstances.
There are three commits:
- The first commit is a new test that demonstrates the bug (it's the same one as in #40366 (closed)).
- The second commit fixes the problem.
- The third commit makes the
TEST/JNE/TEST/JNE
optimisation slightly smarter by removing the first jump rather than the second jump if possible. That way, if there are any register deallocations between the secondTEST/JNE
pair, their position before the conditional jump are preserved, thereby improving future optimisations (it doesn't look like their values need to be preserved through the jump).
System
- Processor architecture: i386, x86_64
What is the current bug behavior?
As explained in #40366 (closed) (and demonstrated in the new test), certain combinations of conditions can be incorrectly optimised under -O3 and cause the wrong code paths to be taken. The compiler itself is affected by this problem.
What is the behavior after applying this patch?
The faulty optimisation is fixed and also improved slightly (fixes #40366 (closed)).
Relevant logs and/or screenshots
For the bug in tcgprocinfo.set_eh_info
in the psub
unit (x86_64-win64, -O4) - before (non-jump related labels removed):
.section .text.n_psub$_$tcgprocinfo_$__$$_set_eh_info,"ax"
.balign 16,0x90
.globl PSUB$_$TCGPROCINFO_$__$$_SET_EH_INFO
PSUB$_$TCGPROCINFO_$__$$_SET_EH_INFO:
.seh_proc PSUB$_$TCGPROCINFO_$__$$_SET_EH_INFO
pushq %rbx
.seh_pushreg %rbx
leaq -32(%rsp),%rsp
.seh_stackalloc 32
.seh_endprologue
movq %rcx,%rbx
call PROCINFO$_$TPROCINFO_$__$$_SET_EH_INFO
testb $16,U_$SYSTEMS_$$_TARGET_INFO+60(%rip)
je .Lj424
movl 256(%rbx),%eax
testl $20,%eax
jne .Lj425
.Lj424:
...
After - the check of cs_implicit_exceptions in current_settings.moduleswitches
is retained and the two bit-tests of flags
(stored in %eax
during the conditional checks) are not merged (one is convered to BT
at a later stage... this itself could use some minor enhancement at a later date):
.section .text.n_psub$_$tcgprocinfo_$__$$_set_eh_info,"ax"
.balign 16,0x90
.globl PSUB$_$TCGPROCINFO_$__$$_SET_EH_INFO
PSUB$_$TCGPROCINFO_$__$$_SET_EH_INFO:
.Lc170:
.seh_proc PSUB$_$TCGPROCINFO_$__$$_SET_EH_INFO
pushq %rbx
.seh_pushreg %rbx
.Lc171:
leaq -32(%rsp),%rsp
.Lc172:
.seh_stackalloc 32
.seh_endprologue
movq %rcx,%rbx
call PROCINFO$_$TPROCINFO_$__$$_SET_EH_INFO
testb $16,U_$SYSTEMS_$$_TARGET_INFO+60(%rip)
je .Lj424
movl 256(%rbx),%eax
testl $4,%eax
jne .Lj425
testl $1024,U_$GLOBALS_$$_CURRENT_SETTINGS+72(%rip)
je .Lj424
btl $4,%eax
jc .Lj425
.Lj424:
...
For the third commit that improves the optimisation, it mostly allows the conversion of mov ref,%reg; test x,%reg
and similar to test x,ref
if the register isn't used afterwards. Such as in ncal
- before:
.Lj360:
movq 176(%rbx),%rax
cmpb $0,354(%rax)
je .Lj368
leaq U_$GLOBALS_$$_CURRENT_SETTINGS(%rip),%rax
testb $192,89(%rax)
je .Lj371
After:
.Lj360:
movq 176(%rbx),%rax
cmpb $0,354(%rax)
je .Lj368
testb $192,U_$GLOBALS_$$_CURRENT_SETTINGS+89(%rip)
je .Lj371
In ngenutil
, a cascade of optimisations occurs - before:
.section .text.n_ngenutil$_$tnodeutils_$__$$_local_varsyms_finalize$tobject$pointer,"ax"
.balign 16,0x90
.globl NGENUTIL$_$TNODEUTILS_$__$$_LOCAL_VARSYMS_FINALIZE$TOBJECT$POINTER
NGENUTIL$_$TNODEUTILS_$__$$_LOCAL_VARSYMS_FINALIZE$TOBJECT$POINTER:
...
cmpb $2,44(%rdx)
jne .Lj100
cmpl $0,76(%rdx)
jng .Lj100
movl 104(%rdx),%eax
testl $258,%eax
jne .Lj100
testl $268435456,104(%rdx)
jne .Lj100
movq 96(%rdx),%rcx
call DEFUTIL_$$_IS_MANAGED_TYPE$TDEF$$BOOLEAN
testb %al,%al
je .Lj100
After - the MOV/TEST
pair is simplified into a TEST
, and then another TEST/JNE/TEST/JNE
optimisation merges it with a TEST
instruction below it that evaluates the same reference:
.section .text.n_ngenutil$_$tnodeutils_$__$$_local_varsyms_finalize$tobject$pointer,"ax"
.balign 16,0x90
.globl NGENUTIL$_$TNODEUTILS_$__$$_LOCAL_VARSYMS_FINALIZE$TOBJECT$POINTER
NGENUTIL$_$TNODEUTILS_$__$$_LOCAL_VARSYMS_FINALIZE$TOBJECT$POINTER:
...
cmpb $2,44(%rdx)
jne .Lj100
cmpl $0,76(%rdx)
jng .Lj100
testl $268435714,104(%rdx)
jne .Lj100
movq 96(%rdx),%rcx
call DEFUTIL_$$_IS_MANAGED_TYPE$TDEF$$BOOLEAN
testb %al,%al
je .Lj100