[x86] MovAndTest2Test extension
Summary
This merge request extends the MovAndTest2Test
peephole optimisation in 2 distinct but related ways:
-
mov input,%reg; and num,%reg; test %reg,%reg
triplets are now optimised if theTEST
instruction uses a smaller sub-register andnum
is valued such that all of its set bits are covered by that sub-register. - When optimising, the
TEST
instruction is made as small as possible given the size ofnum
(e.g. $80 will fit into a byte) and, for i386, only if testing a reference or %eax, %ecx, %edx or %ebx (for byte-sized shrinkage).
These are valid because the AND
instruction that was present before ensures that the bytes not covered by the sub-register are masked out.
System
- Processor architecture: i386, x86_64
What is the current bug behavior?
N/A
What is the behavior after applying this patch?
Code size is smaller and faster around optimised mov; and; test
triplets.
Relevant logs and/or screenshots
An example of the first optimisation in action (System unit x86_64-win64 under -O4) - before:
.section .text.n_system_$$_rem_pio2$double$double$$int64,"ax"
...
.Lj1095:
...
movl %eax,%edx
andl $1,%edx
testb %dl,%dl
je .Lj1102
After:
.section .text.n_system_$$_rem_pio2$double$double$$int64,"ax"
...
.Lj1095:
...
testl $1,%al
je .Lj1102
The second optimisation makes improvements in a large number of units - one such example that occurs multiple times in a row in the compiler's Verbose unit (these were all MovAndTest2Test
optimisations) - before:
.section .text.n_verbose_$$_comment$longint$ansistring,"ax"
...
call fpc_ansistr_incr_ref
movq $0,-16(%rbp)
testl $1,%ebx
setneb %sil
testl $2,%ebx
setneb %dl
testl $1,%ebx
setneb %al
orb %al,%dl
After:
.section .text.n_verbose_$$_comment$longint$ansistring,"ax"
...
call fpc_ansistr_incr_ref
movq $0,-16(%rbp)
testb $1,%bl
setneb %sil
testb $2,%bl
setneb %dl
testb $1,%bl
setneb %al
orb %al,%dl
Additional notes
The optimisation that shrinks TEST
instructions is in a separate unit because it's not determined if such partial reads will incur a penalty.
Also worth noting that there's room for improvement in the verbose unit example, since testb $1,%bl
appears twice without any labels or writes to %bl
in between, hence could be optimised further to:
.section .text.n_verbose_$$_comment$longint$ansistring,"ax"
...
call fpc_ansistr_incr_ref
movq $0,-16(%rbp)
testb $1,%bl
setneb %sil
setneb %al
testb $2,%bl
setneb %dl
orb %al,%dl
Additionally, %al
doesn't get used afterwards, so %sil
can be used instead, since it's set to mirror the same flags:
.section .text.n_verbose_$$_comment$longint$ansistring,"ax"
...
call fpc_ansistr_incr_ref
movq $0,-16(%rbp)
testb $1,%bl
setneb %sil
testb $2,%bl
setneb %dl
orb %sil,%dl