J. Gareth "Kit" Moreton requested to merge CuriousKit/optimisations:mov-and-test-short into main Jun 30, 2022

Summary

This merge request extends the MovAndTest2Test peephole optimisation in 2 distinct but related ways:

mov input,%reg; and num,%reg; test %reg,%reg triplets are now optimised if the TEST instruction uses a smaller sub-register and num is valued such that all of its set bits are covered by that sub-register.
When optimising, the TEST instruction is made as small as possible given the size of num (e.g. $80 will fit into a byte) and, for i386, only if testing a reference or %eax, %ecx, %edx or %ebx (for byte-sized shrinkage).

These are valid because the AND instruction that was present before ensures that the bytes not covered by the sub-register are masked out.

System

Processor architecture: i386, x86_64

What is the current bug behavior?

N/A

What is the behavior after applying this patch?

Code size is smaller and faster around optimised mov; and; test triplets.

Relevant logs and/or screenshots

An example of the first optimisation in action (System unit x86_64-win64 under -O4) - before:

.section .text.n_system_$$_rem_pio2$double$double$$int64,"ax"
	...
.Lj1095:
	...
	movl	%eax,%edx
	andl	$1,%edx
	testb	%dl,%dl
	je	.Lj1102

After:

.section .text.n_system_$$_rem_pio2$double$double$$int64,"ax"
	...
.Lj1095:
	...
	testl	$1,%al
	je	.Lj1102

The second optimisation makes improvements in a large number of units - one such example that occurs multiple times in a row in the compiler's Verbose unit (these were all MovAndTest2Test optimisations) - before:

.section .text.n_verbose_$$_comment$longint$ansistring,"ax"
	...
	call	fpc_ansistr_incr_ref
	movq	$0,-16(%rbp)
	testl	$1,%ebx
	setneb	%sil
	testl	$2,%ebx
	setneb	%dl
	testl	$1,%ebx
	setneb	%al
	orb	%al,%dl

After:

.section .text.n_verbose_$$_comment$longint$ansistring,"ax"
	...
	call	fpc_ansistr_incr_ref
	movq	$0,-16(%rbp)
	testb	$1,%bl
	setneb	%sil
	testb	$2,%bl
	setneb	%dl
	testb	$1,%bl
	setneb	%al
	orb	%al,%dl

Additional notes

The optimisation that shrinks TEST instructions is in a separate unit because it's not determined if such partial reads will incur a penalty.

Also worth noting that there's room for improvement in the verbose unit example, since testb $1,%bl appears twice without any labels or writes to %bl in between, hence could be optimised further to:

.section .text.n_verbose_$$_comment$longint$ansistring,"ax"
	...
	call	fpc_ansistr_incr_ref
	movq	$0,-16(%rbp)
	testb	$1,%bl
	setneb	%sil
	setneb	%al
	testb	$2,%bl
	setneb	%dl
	orb	%al,%dl

Additionally, %al doesn't get used afterwards, so %sil can be used instead, since it's set to mirror the same flags:

.section .text.n_verbose_$$_comment$longint$ansistring,"ax"
	...
	call	fpc_ansistr_incr_ref
	movq	$0,-16(%rbp)
	testb	$1,%bl
	setneb	%sil
	testb	$2,%bl
	setneb	%dl
	orb	%sil,%dl

Edited Jun 30, 2022 by J. Gareth "Kit" Moreton

[x86] MovAndTest2Test extension