Draft: [Cross-platform] Typeconv node stripping
Summary
This merge request aims to reduce the number of type conversion nodes where the source and destination types are identical or near-identical. This is mostly to help with constant propagation and other optimisation techniques that tend to get stuck with type conversions, which is also hampering the development and effectiveness of pure functions.
System
- Processor architecture: All
What is the current bug behavior?
N/A
What is the behavior after applying this patch?
Constant propagation and other optimisations should be more effective as a result.
Relevant logs and/or screenshots
These relate to x86_64-win64 under -O4.
A lot of units get minor improvements here and there, especially where constant propagation and common subexpression elimination is concerned, and especially in regards to recalling values from the stack.
ascii85 unit - Before:
.Lj364:
...
movq -24(%rbp),%rdx
movl $1024,%eax
movq -24(%rbp),%rdx
movq 24(%rdx),%rdx
subl 16(%rdx),%eax
...
After:
.Lj364:
...
movq -24(%rbp),%rdx
movq 24(%rdx),%rdx
movl $1024,%eax
subl 16(%rdx),%eax
...
The System unit especially gets a notable one:
...
.Lj494:
movl $1,%eax
cmpb $127,(%rcx)
jna .Lj496
movb (%rcx),%r9b
notb %r9b
movzbl %r9b,%r9d
bsrq %r9,%r9
jne .Lj497
movl $255,%r9d
.Lj497:
cmpb $1,%r9b
setbeb %r11b
cmpb $6,%r9b
setaeb %r10b
orb %r10b,%r11b
je .Lj499
negq %rax
ret
.p2align 4,,10
.p2align 3
.Lj499:
...
In the patch, the negq %rax
instruction becomes movq $-1,%rax
because constant propagation is able to identify that the function result in %rax is equal to 1 when it reaches the negation instruction, as seen in the corresponding Pascal code:
result:=1;
{ multiple byte utf-8 code point? }
if p[0]>#127 then
begin
{ bsr searches for the leftmost 1 bit. We are interested in the
leftmost 0 bit, so first invert the value
}
firstzerobit:=bsrbyte(not(byte(p[0])));
{ if there is no zero bit or the first zero bit is the rightmost bit
(bit 0), this is an invalid UTF-8 byte ($ff cannot appear in an
UTF-8-encoded string, and in the worst case bit 1 has to be zero)
Additionally, 5-byte UTF-8 sequences don't exist either, so bit 1
cannot be the first zero-bit either. And bits 6 and 7 can't be 0
either in the first byte.
}
if (firstzerobit<=1) or (firstzerobit>=6) then
begin
result:=-result;
exit;
end;
...
This is also apparent in the node dump for the instruction - before:
<blockn resultdef="$void" pos="1150,11" flags="nf_pass1_done" complexity="255">
<statementn pos="1151,28">
<inlinen resultdef="$void" pos="1151,13" flags="nf_pass1_done,nf_internal" complexity="255" inlinenumber="in_neg_assign_x">
<typeconvn resultdef="Int64" pos="1151,22" flags="nf_pass1_done,nf_absolute" complexity="0" convtype="tc_equal">
<loadn resultdef="Int64" pos="1151,28" flags="nf_pass1_done,nf_modify" complexity="0">
<symbol>result</symbol>
</loadn>
</typeconvn>
</inlinen>
</statementn>
<statementn pos="1152,17">
<exitn resultdef="$void" pos="1152,13" flags="nf_pass1_done" complexity="2">
</exitn>
</statementn>
</blockn>
After:
<blockn resultdef="$void" pos="1150,11" flags="nf_pass1_done" complexity="3">
<statementn resultdef="$void" pos="1151,28" flags="nf_pass1_done" complexity="3">
<assignn resultdef="$void" pos="1151,13" flags="nf_pass1_done" complexity="1">
<loadn resultdef="Int64" pos="1151,19" flags="nf_pass1_done,nf_write,nf_absolute" complexity="0">
<symbol>result</symbol>
</loadn>
<ordconstn resultdef="Int64" pos="1151,21" flags="nf_pass1_done" complexity="0" rangecheck="FALSE">
<value>-1</value>
</ordconstn>
</assignn>
</statementn>
<statementn resultdef="$void" pos="1152,17" flags="nf_pass1_done" complexity="2">
<exitn resultdef="$void" pos="1152,13" flags="nf_pass1_done" complexity="2">
</exitn>
</statementn>
</blockn>
The Classes unit has a few random tidbits. Before:
.Lj5169:
cmpq $0,U_$CLASSES_$$_NEEDRESOLVING(%rip)
je .Lj5171
movq U_$CLASSES_$$_NEEDRESOLVING(%rip),%rax
movq 16(%rax),%rax
movq %rax,-16(%rbp)
jmp .Lj5177
.p2align 4,,10
.p2align 3
.Lj5176:
movq -16(%rbp),%rax
movq 8(%rax),%rax
movq %rax,-16(%rbp)
.Lj5177:
cmpq $0,-16(%rbp)
je .Lj5171
movq -16(%rbp),%rax
movq 16(%rax),%rax
cmpq -8(%rbp),%rax
jne .Lj5176
...
This one is a net zero because a particular peephole optimization isn't performed, since a register is reused - after:
.Lj5169:
cmpq $0,U_$CLASSES_$$_NEEDRESOLVING(%rip)
je .Lj5171
movq U_$CLASSES_$$_NEEDRESOLVING(%rip),%rax
movq 16(%rax),%rax
movq %rax,-16(%rbp)
jmp .Lj5177
.p2align 4,,10
.p2align 3
.Lj5176:
movq -16(%rbp),%rax
movq 8(%rax),%rax
movq %rax,-16(%rbp)
.Lj5177:
movq -16(%rbp),%rax
testq %rax,%rax ; (Not optimised because %rax is reused)
je .Lj5171
; Instruction removed
movq 16(%rax),%rax
cmpq -8(%rbp),%rax
jne .Lj5176
...
movq -16(%rbp),%rax; testq %rax,%rax
doesn't get optimised into cmpq $0,-16(%rbp)
because of not needing to read from -16(%rbp)
multiple times. However, there is potential for future peephole optimisation because .Lj5177 only has two entry points: the instruction before it, and a single unconditional jump. Furthermore, the instruction immediately before the jump and the label are identical, namely movq %rax,-16(%rbp)
, so it's possible to remove both instructions and insert a copy of it immediately after the label and perform more peephole optimisations know that it's known that -16(%rbp)
and %rax
are identical.
Additional Notes:
- One initial sticking point with removing typeconv nodes is that it got confused with generic types which resulted in a bad node tree. To help get around this, a new
is_undefined
utility function was introduced that checked to see if a def was of typeundefineddef
or pointed to such. - A call to
simplify
inttypeconvnode.pass_typecheck
was removed since this caused problems if the child node was of typeaddrn
. It was largely superfluous anyway and gets called again anyway as part of the regular first pass of the node tree. - An addition was made to the XML node dump feature in that type conversion nodes now output the
totypedef
field if it's different toresultdef
(it's omitted if they're the same, which is the most common situation).