Skip to content

Draft: [x86] Extensions to ADD and SUB optimisations.

Summary

This merge request extends the search on parts of the x86 peephole optimizer in an attempt to merge ADD and SUB instructions, or combining the with MOV instructions to create LEA instructions. It also fixes a minor bug in OptPass1SUB where an instruction wasn't converted to ADD when it should have been (although due to an oversight that was rectified as part of these optimisations, this was never triggered).

System

  • Processor architecture: i386, x86_64

What is the current bug behavior?

N/A

What is the behavior after applying this patch?

Some ADD and SUB instructions now get merged, and there are more opportunities for MOV/ADD and MOV/SUB to get converted into LEA instructions.

Relevant logs and/or screenshots

In bufdataset.s under x86_64-win64, -O4 - before:

.section .text.n_bufdataset_$$_dbcomparevarbytes$pointer$pointer$longint$tlocateoptions$$int64,"ax"
	...
	movq	%rcx,%rsi
	movq	%rdx,%rdi
	movzwl	(%rcx),%r12d
	movzwl	(%rdx),%r13d
	addq	$2,%rsi
	addq	$2,%rdi
	...

Now that GetNextInstructionUsingReg is used internally, the two ADD instructions get merged into the MOVs - after:

.section .text.n_bufdataset_$$_dbcomparevarbytes$pointer$pointer$longint$tlocateoptions$$int64,"ax"
	...
	leaq	2(%rcx),%rsi
	leaq	2(%rdx),%rdi
	movzwl	(%rcx),%r12d
	movzwl	(%rdx),%r13d
	...

A slightly extreme version (although with the instructions in reverse order) occurs in generics.hashes.s - before:

.section .text.n_generics.hashes_$$_hashword2$plongword$int64$longword$longword,"ax"
	...
	leaq	(,%rdx,4),%rax
	subl	$559038737,%eax
	addl	(%r8),%eax
	movl	%eax,%r10d
	movl	%eax,%r11d
	...

After (two pipeline stalls are removed despite the increase in code size - this won't be performed under -Os):

.section .text.n_generics.hashes_$$_hashword2$plongword$int64$longword$longword,"ax"
	...
	leaq	(,%rdx,4),%rax
	addl	(%r8),%eax
	leal	-559038737(%eax),%r10d
	leal	-559038737(%eax),%r11d
	subl	$559038737,%eax
	...

In jcparam - before:

.section .text.n_jcparam_$$_fill_a_scan$hxangihuyymf,"ax"
	...
	movq	%rcx,%rax
	movl	$1,(%rcx)
	movl	%edx,4(%rcx)
	movl	%r8d,20(%rcx)
	movl	%r9d,24(%rcx)
	movl	48(%rsp),%edx
	movl	%edx,28(%rcx)
	movl	56(%rsp),%edx
	movl	%edx,32(%rcx)
	addq	$36,%rax
	popq	%rbp
	ret

After:

.section .text.n_jcparam_$$_fill_a_scan$hxangihuyymf,"ax"
	...
	leaq	36(%rcx),%rax
	movl	$1,(%rcx)
	movl	%edx,4(%rcx)
	movl	%r8d,20(%rcx)
	movl	%r9d,24(%rcx)
	movl	48(%rsp),%edx
	movl	%edx,28(%rcx)
	movl	56(%rsp),%edx
	movl	%edx,32(%rcx)
	popq	%rbp
	ret

In jdsample, in a couple of places, a pair of ADD and SUB instructions get merged - before:

	...
	movl	40(%rdx),%r12d
	subl	$3,%r12d
	addl	$1,%r12d
	.p2align 4,,10
	.p2align 3
.Lj64:

After:

	...
	movl	40(%rdx),%r12d
	subl	$2,%r12d
	.p2align 4,,10
	.p2align 3
.Lj64:

This is even more ridiculous in rgobj - before:

	subb	$1,%al
	testb	%al,%al
	movb	%dil,%al
	subb	$1,%al
	addb	$1,%al
	movb	%al,%dil
	.p2align 4,,10
	.p2align 3
.Lj714:

After:

	subb	$1,%al
	testb	%al,%al
	.p2align 4,,10
	.p2align 3
.Lj714:

(There is an interesting situation here where a TEST instruction is present without any checks of conditional flags. This is due to # Peephole Optimization: Cmpcc2Testcc - condition AE/NB/NC/NO --> Always - something to fix later)

Edited by J. Gareth "Kit" Moreton

Merge request reports