[x86] Inefficiency fixes to OptPass1TEST (!549) · Merge requests · FPC / FPC / FPC Source

J. Gareth "Kit" Moreton requested to merge CuriousKit/optimisations:optpass1test-inefficiencies into main Nov 29, 2023

Summary

This merge request fixes some inefficiencies that were unearthed by !547 (merged) - occasionally, the TEST/JNE/TEST/JNE merged optimisation would depend on the second jump not yet being optimised due to the sequence being of the form test $x,%reg/ref; jne .lbl1; test $y,%reg/ref; jne .lbl1; jmp .lbl2; .lbl1: - if the second jump is optimised first, or is already in an optimised form, the entire sequence would not be optimised.

Therefore, besides some other minor improvements in the method, the following variant is now optimised: test $x,%reg/ref; jne .lbl1; test $y,%reg/ref; je .lbl2; .lbl1: becomes test $(x or y),%reg/ref; je .lbl2; .lbl1:

System

Processor architecture: i386, x86_64

What is the current bug behavior?

N/A

What is the behavior after applying this patch?

Some inefficiencies in the x86 peephole optimizer have been resolved.

Relevant logs and/or screenshots

Besides the cases uncovered in !547 (merged), there are a handful of improvements that are independent of that merge request. For example, in the ncal unit (x86_64-win64, -O4), before:

	...
.Lj1191:
	leaq	U_$GLOBALS_$$_CURRENT_SETTINGS(%rip),%rax
	testb	$64,89(%rax)
	jne	.Lj1195
	testb	$128,89(%rax)
	je	.Lj1189
.Lj1195:
	...

After:

	...
.Lj1191:
	testb	$192,U_$GLOBALS_$$_CURRENT_SETTINGS+89(%rip)
	je	.Lj1189
	...

A lot of similiar sequences appear in the compiler source.

Edited Nov 30, 2023 by J. Gareth "Kit" Moreton

[x86] Inefficiency fixes to OptPass1TEST

Summary

System

What is the current bug behavior?

What is the behavior after applying this patch?

Relevant logs and/or screenshots

Merge request reports