Skip to content

[x86] Improved CMP/Jcc redundancy stripping and related fixes

Summary

This merge request improves the TEST/Jcc/TEST set of optimisations by looking further ahead (under -O3 and above) to remove or convert more redundant CMP/Jcc and TEST/Jcc pairs. This is done by noting that they have a deterministic result due to the presence of near-identical pairs appearing earlier in the code (and whose operands have the same value).

Additionally, some related oversights and improvements have been fixed in regards to register tracking, especially with the FLAGS (e.g. RegModifiedByInstruction(NR_DEFAULTFLAGS, ...) returned False on IMUL and IDIV instructions when they actually leave the flags in an undefined state).

System

  • Processor architecture: i386, x86_64

What is the current bug behavior?

N/A

What is the behavior after applying this patch?

  • (I)MUL and (I)DIV instructions are more accurately handled when it comes to register tracking (not just the FLAGS register, but all registers).
  • Under -O3 and above, more TEST/Jcc and CMP/Jcc pairs will be removed.

Relevant logs and/or screenshots

Under -O2, x86_64-win64, the sle unit showcases the improved IMUL handling - before:

.section .text.n_sle$_$slegpb$longint$longint$double$double$double$double$longint_$$_decomp$longint$longint,"ax"
	...
.seh_endprologue
	movq	%rcx,%rax
	leal	-1(%r8d),%edx
	imull	-28(%rcx),%edx
	movl	%edx,-24(%rax)
	...

After - the optimisations in OptPass1MOV can now see that %rax and %rcx are not affected by the 2-operand IMUL instruction and so can optimise the two MOV instructions:

.section .text.n_sle$_$slegpb$longint$longint$double$double$double$double$longint_$$_decomp$longint$longint,"ax"
	...
.seh_endprologue
	leal	-1(%r8d),%edx
	imull	-28(%rcx),%edx
	movl	%edx,-24(%rcx)
	...

Under -O4, x86_64-win64, aoptobj shows several instances of the improved TEST/Jcc/TEST optimisations - before:

	...
.Lj43:
	testq	%rsi,%rsi
	je	.Lj27
	cmpb	$19,32(%rsi)
	je	.Lj42
	testq	%rsi,%rsi
	je	.Lj27
	movzbl	32(%rsi),%edx
	movl	$14271372,%eax
	...

After - note that even though the FLAGS get overwritten, because %rsi hasn't changed value, the second branch to .Lj27 cannot possibly be taken because it would have branched at the first .Lj27:

	...
.Lj43:
	testq	%rsi,%rsi
	je	.Lj27
	cmpb	$19,32(%rsi)
	je	.Lj42
	movzbl	32(%rsi),%edx
	movl	$14271372,%eax
	...

avltree has an instance where a label is removed and the removal of the TEST/Jcc pair results in cascade of additional optimisation - before:

.section .text.n_avl_tree$_$tavltree_$_writereporttostream$tstream_$$_writestr$ansistring,"ax"
	.balign 16,0x90
.globl	AVL_TREE$_$TAVLTREE_$_WRITEREPORTTOSTREAM$TSTREAM_$$_WRITESTR$ANSISTRING
AVL_TREE$_$TAVLTREE_$_WRITEREPORTTOSTREAM$TSTREAM_$$_WRITESTR$ANSISTRING:
.Lc451:
.seh_proc AVL_TREE$_$TAVLTREE_$_WRITEREPORTTOSTREAM$TSTREAM_$$_WRITESTR$ANSISTRING
	leaq	-40(%rsp),%rsp
.Lc452:
.seh_stackalloc 40
.seh_endprologue
	movq	%rcx,%rax
	testq	%rdx,%rdx
	je	.Lj613
	movq	%rdx,%r8
	testq	%rdx,%rdx
	je	.Lj646
	movq	-8(%rdx),%r8
.Lj646:
	movq	32(%rcx),%rcx
	movq	(%rcx),%rax
	call	*264(%rax)
.Lj613:
	nop
	leaq	40(%rsp),%rsp
.Lc453:
	ret
.seh_endproc

After - je .Lj646 can never logically branch, so it and the preceding TEST instruction are removed, along with .Lj646 as there are no longer any references to it; OptPass1MOV then simplifies the two MOV instructions either side of the TEST/Jcc pair:

.section .text.n_avl_tree$_$tavltree_$_writereporttostream$tstream_$$_writestr$ansistring,"ax"
	.balign 16,0x90
.globl	AVL_TREE$_$TAVLTREE_$_WRITEREPORTTOSTREAM$TSTREAM_$$_WRITESTR$ANSISTRING
AVL_TREE$_$TAVLTREE_$_WRITEREPORTTOSTREAM$TSTREAM_$$_WRITESTR$ANSISTRING:
.Lc451:
.seh_proc AVL_TREE$_$TAVLTREE_$_WRITEREPORTTOSTREAM$TSTREAM_$$_WRITESTR$ANSISTRING
	leaq	-40(%rsp),%rsp
.Lc452:
.seh_stackalloc 40
.seh_endprologue
	movq	%rcx,%rax
	testq	%rdx,%rdx
	je	.Lj613
	movq	-8(%rdx),%r8
	movq	32(%rcx),%rcx
	movq	(%rcx),%rax
	call	*264(%rax)
.Lj613:
	nop
	leaq	40(%rsp),%rsp
.Lc453:
	ret
.seh_endproc

cfileutl has a more subtle one:

	...
	movq	-8(%r12),%r12
	testq	%r12,%r12
	je	.Lj234
	movq	%r12,%r13
	testq	%r12,%r12 ; <-- Only this instruction gets removed.
	jle	.Lj240
	xorl	%r14d,%r14d
	.p2align 4,,10
	.p2align 3
.Lj241:
	...

In this case, the jump is not removed because the second condition is different and not a subset of the first one. However, the FLAGS are guaranteed to have not changed in between, so the TEST instruction can be removed.

The classes unit has an example where a conditional jump gets converted to an unconditional one due to the condition being an exact inverse - before:

	...
.Lj6031:
	testq	%rax,%rax
	je	.Lj6036
	movl	144(%rsi),%edx
	cmpl	144(%rax),%edx
	jne	.Lj6034
	testq	%rax,%rax
	jne	.Lj6035
.Lj6036:
	...

After:

	...
.Lj6031:
	testq	%rax,%rax
	je	.Lj6036
	movl	144(%rsi),%edx
	cmpl	144(%rax),%edx
	jne	.Lj6034
	jmp	.Lj6035
.Lj6036:
	...

For another IMUL tracking improvement under -O4 in sysutils - before:

	...
	movl	%edx,%r9d
	imull	$100,%r8d,%eax
	leal	(%eax,%ecx),%r8d
	cmpl	$10,%r10d
	jnb	.Lj7062
	addl	$3,%r10d
	jmp	.Lj7063
	.p2align 4,,10
	.p2align 3
.Lj7062:
	subl	$9,%r10d
	addl	$1,%r8d
.Lj7063:
	movw	%r8w,(%rbx)
	movw	%r10w,(%rsi)
	movw	%r9w,(%rdi)
.Lj7056:
	...

After - OptPass1MOV is able to look so far ahead, and beyond the jumps and labels, and see that %r9d and %edx have the same value throughout this entire block and so can replace %r9w with %dx at the end and remove the first MOV:

	...
	imull	$100,%r8d,%eax
	leal	(%eax,%ecx),%r8d
	cmpl	$10,%r10d
	jnb	.Lj7062
	addl	$3,%r10d
	jmp	.Lj7063
	.p2align 4,,10
	.p2align 3
.Lj7062:
	subl	$9,%r10d
	addl	$1,%r8d
.Lj7063:
	movw	%r8w,(%rbx)
	movw	%r10w,(%rsi)
	movw	%dx,(%rdi)
.Lj7056:
	...
Edited by J. Gareth "Kit" Moreton

Merge request reports