Skip to content

[x86] Pass iteration and reference counting fix

Summary

This merge request fixes a couple of minor bugs in the x86 peephole optimizer:

  • Fixed a bug where the aoc_ForceNewIteration flag was checked in Pass 2 instead of Pass 1, thereby not having any effect.
  • Fixed a bug in the "Mov0LblCmp0Je -> Mov0JmpLblCmp0Je" optimisation where a reference count was increased twice instead of once.

System

  • Processor architecture: i386, x86_64

What is the current bug behavior?

  • Some optimsed x86 programs under -O3 and above aren't as efficient as they could be.
  • No observed anomalies from the reference counting bug yet, but good to nip it in the bud.

What is the behavior after applying this patch?

  • Optimisation is better under -O3 and above because the aoc_ForceNewIteration flag is not erroneously ignored any more.

Relevant logs and/or screenshots

A fair number of files show minor changes, usually thanks to DeepMOVOpt attempting to minimise pipeline stalls. The cgobj unit shows somehing a bit more profound under x86_64-win64 -O4 though - before:

.Lj589:
	movl	56(%rbp),%eax
	movl	%eax,40(%rsp)
	movl	8(%r13),%eax
	movl	%eax,32(%rsp)
	movzbl	%r12b,%r9d
	movzbl	%dil,%r8d
	movq	%rsi,%rdx
	movq	%rbx,%rcx
	movq	(%rbx),%rax
	call	*704(%rax)
	jmp	.Lj587

After:

.Lj589:
	movl	56(%rbp),%eax
	movl	%eax,40(%rsp)
	movl	8(%r13),%eax
	movl	%eax,32(%rsp)
	movzbl	%r9b,%r9d
	movzbl	%r8b,%r8d
	movq	(%rcx),%rax
	call	*704(%rax)
	jmp	.Lj587

In this case, with an extra iteration of Pass 1 allowed to run, movq %rsi,%rdx and ``movq %rbx,%rcx``` could be converted and removed completely since the register pairs are already identical

Merge request reports