[Cross-platform] Inverted integer range node-level optimisation
Summary
This merge request slightly extends the if (v >= x) and (v <= y) then
optimisation (where x < y) to also process its logical inverse (courtesy of De Morgan's theorem), if (v < x) or (v > y) then
. This construct appears very frequently in code, testing to see if an integer variable falls outside of a valid range and requires very little modification to existing code, and yet produces much more compact code as a result.
This is a node-level optimisation (specifically taddnode.simplify
) and therefore it benefits all platforms.
System
- Processor architecture: Cross-platform
What is the current bug behavior?
N/A
What is the behavior after applying this patch?
Pascal code that checks whether an integer variable falls outside of a given range will now be converted into much more efficient assembly language.
Relevant logs and/or screenshots
The original example was first noted in !612 (merged) and !614 (merged) in the SysUtils
unit, where the following Pascal code under x86_64-win64, -O4:
If (YY<1980) or (YY>2099) then
Result:=0
else
...
...produces:
.section .text.n_sysutils_$$_datetimetofiledate$tdatetime$$int64,"ax"
...
movw 40(%rsp),%dx
cmpw $1980,%dx
setbb %cl
cmpw $2099,%dx
setab %dl
xorl %eax,%eax
orb %dl,%cl
jne .Lj11254
...
When changed to:
Result:=0;
If (YY>=1980) and (YY<=2099) then
...
It instead produces something far more efficient:
xorl %eax,%eax
movzwl 40(%rsp),%edx
subl $1980,%edx
cmpw $119,%dx
jnbe .Lj11254
...
Running the original code with this patch, the following is produced:
movzwl 40(%rsp),%eax
subl $1980,%eax
cmpw $119,%ax
movl $0,%eax
ja .Lj11254
This is close to what the modified Pascal code produces (regarding the conditions on the jump, nbe = a
). Unfortunately the movl $0,%eax
instruction cannot be relcated because %eax
is being used as an intermediate register for the conditional check. Nevertheless, it is a massive improvement over the trunk.
An earlier example in SysUtils
on x86_64-win64, -O4 - before:
.section .text.n_sysutils_$$_incamonth$word$word$word$longint,"ax"
...
cmpl $11,%eax
setgb %cl
testl %eax,%eax
setlb %dl
orb %dl,%cl
je .Lj10538
...
After:
.section .text.n_sysutils_$$_incamonth$word$word$word$longint,"ax"
...
cmpl $11,%eax
jna .Lj10538
...
Outside of SysUtils
, a huge number of units receive improvements, include aoptx86
that has grown massively in size over the years due to individual peephole optimisations. One improvement that appears a few times is in LEA
optimisations to determine if a calculated offset is within the valid range of a signed 32-bit integer. For example - before:
...
.Lj3017:
movq 48(%rsp),%rax
movq 64(%rax),%rax
movq 8(%rax),%rax
movq (%rax),%rax
cmpq $2147483647,%rax
jg .Lj3021
cmpq $-2147483648,%rax
jnl .Lj3023
.Lj3021:
movq 48(%rsp),%rax
...
After:
...
.Lj3017:
movq 48(%rsp),%rax
movq 64(%rax),%rax
movq 8(%rax),%rax
movq (%rax),%rax
subq $-2147483648,%rax
movl $4294967295,%edx
cmpq %rdx,%rax
jna .Lj3022 ; // <-- this was originally .Lj3023 - label numbering is different.
movq 48(%rsp),%rax
...
In the packages, the otherwise tiny zinflate
unit receives an improvement - before:
...
.Lj25:
cmpw $8,%si
setlb %al
cmpw $15,%si
setgb %dl
orb %dl,%al
je .Lj27
...
After:
...
.Lj25:
movzwl %si,%eax
subl $8,%eax
cmpw $7,%ax
jna .Lj27
...
Future Development
It's noted that when dealing with word-sized arguments, the final instructions mix 32-bit and 16-bit registers on x86. This is likely due to some inefficiencies in the OptPass2MOV
routine that will be investigated at a later date.