Skip to content

[x86] Inefficiency fixes to OptPass1TEST

Summary

This merge request fixes some inefficiencies that were unearthed by !547 (merged) - occasionally, the TEST/JNE/TEST/JNE merged optimisation would depend on the second jump not yet being optimised due to the sequence being of the form test $x,%reg/ref; jne .lbl1; test $y,%reg/ref; jne .lbl1; jmp .lbl2; .lbl1: - if the second jump is optimised first, or is already in an optimised form, the entire sequence would not be optimised.

Therefore, besides some other minor improvements in the method, the following variant is now optimised: test $x,%reg/ref; jne .lbl1; test $y,%reg/ref; je .lbl2; .lbl1: becomes test $(x or y),%reg/ref; je .lbl2; .lbl1:

System

  • Processor architecture: i386, x86_64

What is the current bug behavior?

N/A

What is the behavior after applying this patch?

Some inefficiencies in the x86 peephole optimizer have been resolved.

Relevant logs and/or screenshots

Besides the cases uncovered in !547 (merged), there are a handful of improvements that are independent of that merge request. For example, in the ncal unit (x86_64-win64, -O4), before:

	...
.Lj1191:
	leaq	U_$GLOBALS_$$_CURRENT_SETTINGS(%rip),%rax
	testb	$64,89(%rax)
	jne	.Lj1195
	testb	$128,89(%rax)
	je	.Lj1189
.Lj1195:
	...

After:

	...
.Lj1191:
	testb	$192,U_$GLOBALS_$$_CURRENT_SETTINGS+89(%rip)
	je	.Lj1189
	...

A lot of similiar sequences appear in the compiler source.

Edited by J. Gareth "Kit" Moreton

Merge request reports