Skip to content

[x86] POPCNT and extraneous MOV optimisations

Summary

This merge request adds some optimisations for POPCNT and LZCNT instructions and ones closely related:

  • for "popcnt %reg1,%reg2; test %reg2,%reg2", the test instruction gets removed (it also works for "test %reg1,%reg1"). Similarly for "lzcnt %reg1,%reg2; test %reg2,%reg2" (although not "test %reg1,%reg1" this time). This is a simple extension of PostPeepholeOptTestOr.
  • At the end of OptPass1MOV there is now a 'backward optimisation" that looks for "func (oper),%reg1; mov %reg1,%reg2; (dealloc %reg1)" and changes it to "func (oper),%reg2". In this instance 'func' is any operation other than CMOV (because it might not write to the destination) with Rop1 and Wop2 flags. It was originally designed to optimise POPCNT assignments, but it also optimises things like "cvtsd2si %xmm0,%rax; movq %rax,%rcx".

System

  • Processor architecture: i386, x86_64

What is the current bug behavior?

N/A

What is the behavior after applying this patch?

  • Some sequences of code that use POPCNT and LZCNT are now more efficient.
  • Some more extraneous MOV instructions are removed.

Additional notes

  • Because the POPCNT/TEST optimisation is not triggered anywhere in the compiler, RTL or packages, a couple of new tests have been introduced to evaluate the optimisations, namely "tests/test/opt/tpopcnt1.pp" and "tests/test/opt/tpopcnt2.pp"

Relevant logs and/or screenshots

The Variants unit under "-CpCOREAVX -OpCOREAVX -CfAVX" receives the most improvement with many optimisations like the following - before:

	vdivsd	%xmm0,%xmm6,%xmm0
	vmulsd	%xmm0,%xmm8,%xmm0
	vcvtsd2si	%xmm0,%rax
	movq	%rax,%rbx
	jmp	.Lj689
	.p2align 4,,10
	.p2align 3
.Lj708:

After:

	vdivsd	%xmm0,%xmm6,%xmm0
	vmulsd	%xmm0,%xmm8,%xmm0
	vcvtsd2si	%xmm0,%rbx
	jmp	.Lj689
	.p2align 4,,10
	.p2align 3
.Lj708:

Merge request reports