Draft: [x86] Extensions to ADD and SUB optimisations.
Summary
This merge request extends the search on parts of the x86 peephole optimizer in an attempt to merge ADD and SUB instructions, or combining the with MOV instructions to create LEA instructions. It also fixes a minor bug in OptPass1SUB
where an instruction wasn't converted to ADD
when it should have been (although due to an oversight that was rectified as part of these optimisations, this was never triggered).
System
- Processor architecture: i386, x86_64
What is the current bug behavior?
N/A
What is the behavior after applying this patch?
Some ADD and SUB instructions now get merged, and there are more opportunities for MOV/ADD and MOV/SUB to get converted into LEA instructions.
Relevant logs and/or screenshots
In bufdataset.s under x86_64-win64, -O4 - before:
.section .text.n_bufdataset_$$_dbcomparevarbytes$pointer$pointer$longint$tlocateoptions$$int64,"ax"
...
movq %rcx,%rsi
movq %rdx,%rdi
movzwl (%rcx),%r12d
movzwl (%rdx),%r13d
addq $2,%rsi
addq $2,%rdi
...
Now that GetNextInstructionUsingReg
is used internally, the two ADD instructions get merged into the MOVs - after:
.section .text.n_bufdataset_$$_dbcomparevarbytes$pointer$pointer$longint$tlocateoptions$$int64,"ax"
...
leaq 2(%rcx),%rsi
leaq 2(%rdx),%rdi
movzwl (%rcx),%r12d
movzwl (%rdx),%r13d
...
A slightly extreme version (although with the instructions in reverse order) occurs in generics.hashes.s - before:
.section .text.n_generics.hashes_$$_hashword2$plongword$int64$longword$longword,"ax"
...
leaq (,%rdx,4),%rax
subl $559038737,%eax
addl (%r8),%eax
movl %eax,%r10d
movl %eax,%r11d
...
After (two pipeline stalls are removed despite the increase in code size - this won't be performed under -Os
):
.section .text.n_generics.hashes_$$_hashword2$plongword$int64$longword$longword,"ax"
...
leaq (,%rdx,4),%rax
addl (%r8),%eax
leal -559038737(%eax),%r10d
leal -559038737(%eax),%r11d
subl $559038737,%eax
...
In jcparam - before:
.section .text.n_jcparam_$$_fill_a_scan$hxangihuyymf,"ax"
...
movq %rcx,%rax
movl $1,(%rcx)
movl %edx,4(%rcx)
movl %r8d,20(%rcx)
movl %r9d,24(%rcx)
movl 48(%rsp),%edx
movl %edx,28(%rcx)
movl 56(%rsp),%edx
movl %edx,32(%rcx)
addq $36,%rax
popq %rbp
ret
After:
.section .text.n_jcparam_$$_fill_a_scan$hxangihuyymf,"ax"
...
leaq 36(%rcx),%rax
movl $1,(%rcx)
movl %edx,4(%rcx)
movl %r8d,20(%rcx)
movl %r9d,24(%rcx)
movl 48(%rsp),%edx
movl %edx,28(%rcx)
movl 56(%rsp),%edx
movl %edx,32(%rcx)
popq %rbp
ret
In jdsample, in a couple of places, a pair of ADD and SUB instructions get merged - before:
...
movl 40(%rdx),%r12d
subl $3,%r12d
addl $1,%r12d
.p2align 4,,10
.p2align 3
.Lj64:
After:
...
movl 40(%rdx),%r12d
subl $2,%r12d
.p2align 4,,10
.p2align 3
.Lj64:
This is even more ridiculous in rgobj - before:
subb $1,%al
testb %al,%al
movb %dil,%al
subb $1,%al
addb $1,%al
movb %al,%dil
.p2align 4,,10
.p2align 3
.Lj714:
After:
subb $1,%al
testb %al,%al
.p2align 4,,10
.p2align 3
.Lj714:
(There is an interesting situation here where a TEST
instruction is present without any checks of conditional flags. This is due to # Peephole Optimization: Cmpcc2Testcc - condition AE/NB/NC/NO --> Always
- something to fix later)