[x86] Pass iteration and reference counting fix (!266) · Merge requests · FPC / FPC / FPC Source

J. Gareth "Kit" Moreton requested to merge CuriousKit/optimisations:x86-minor-fixes into main Jul 13, 2022

Summary

This merge request fixes a couple of minor bugs in the x86 peephole optimizer:

Fixed a bug where the aoc_ForceNewIteration flag was checked in Pass 2 instead of Pass 1, thereby not having any effect.
Fixed a bug in the "Mov0LblCmp0Je -> Mov0JmpLblCmp0Je" optimisation where a reference count was increased twice instead of once.

System

Processor architecture: i386, x86_64

What is the current bug behavior?

Some optimsed x86 programs under -O3 and above aren't as efficient as they could be.
No observed anomalies from the reference counting bug yet, but good to nip it in the bud.

What is the behavior after applying this patch?

Optimisation is better under -O3 and above because the aoc_ForceNewIteration flag is not erroneously ignored any more.

Relevant logs and/or screenshots

A fair number of files show minor changes, usually thanks to DeepMOVOpt attempting to minimise pipeline stalls. The cgobj unit shows somehing a bit more profound under x86_64-win64 -O4 though - before:

.Lj589:
	movl	56(%rbp),%eax
	movl	%eax,40(%rsp)
	movl	8(%r13),%eax
	movl	%eax,32(%rsp)
	movzbl	%r12b,%r9d
	movzbl	%dil,%r8d
	movq	%rsi,%rdx
	movq	%rbx,%rcx
	movq	(%rbx),%rax
	call	*704(%rax)
	jmp	.Lj587

After:

.Lj589:
	movl	56(%rbp),%eax
	movl	%eax,40(%rsp)
	movl	8(%r13),%eax
	movl	%eax,32(%rsp)
	movzbl	%r9b,%r9d
	movzbl	%r8b,%r8d
	movq	(%rcx),%rax
	call	*704(%rax)
	jmp	.Lj587

In this case, with an extra iteration of Pass 1 allowed to run, movq %rsi,%rdx and ``movq %rbx,%rcx``` could be converted and removed completely since the register pairs are already identical

[x86] Pass iteration and reference counting fix

Summary

System

What is the current bug behavior?

What is the behavior after applying this patch?

Relevant logs and/or screenshots

Merge request reports