[Cross-platform] The "Val" intrinsic is now simplified for string constants
Summary
This merge request implements node-level simplification of Val
instructions if the inputs are deterministic.
System
- Processor architecture: All
What is the current bug behavior?
N/A
What is the behavior after applying this patch?
Val
instructions that take a string constnat are now simplified so it doesn't get called. Essentially like what happens with Str
instructions in !346 (merged).
Relevant logs and/or screenshots
No examples appear in the compiler, RTL or packages, so a new test, test/cg/tval1.pp
, was added to showcase the feature. On the trunk:
...
.section .text.n_p$tval1_$$_dotest,"ax"
.balign 16,0x90
.globl P$TVAL1_$$_DOTEST
P$TVAL1_$$_DOTEST:
.Lc7:
.seh_proc P$TVAL1_$$_DOTEST
leaq -56(%rsp),%rsp
.Lc8:
.seh_stackalloc 56
.seh_endprologue
leaq 40(%rsp),%r8
leaq _$TVAL1$_Ld1(%rip),%rdx
movl $4,%ecx
call fpc_val_sint_shortstr
movl %eax,32(%rsp)
cmpw $0,40(%rsp)
jne .Lj9
movl $1,%ecx
call SYSTEM_$$_HALT$LONGINT
.Lj9:
leaq 40(%rsp),%r8
leaq _$TVAL1$_Ld2(%rip),%rdx
movl $4,%ecx
call fpc_val_sint_shortstr
movl %eax,32(%rsp)
cmpw $0,40(%rsp)
je .Lj12
movl $2,%ecx
call SYSTEM_$$_HALT$LONGINT
.Lj12:
...
(_$TVAL1$_Ld1
and _$TVAL1$_Ld2
refer to string constants)
With this improvement under x86_64-win64:
...
.section .text.n_p$tval1_$$_dotest,"ax"
.balign 16,0x90
.globl P$TVAL1_$$_DOTEST
P$TVAL1_$$_DOTEST:
.Lc7:
.seh_proc P$TVAL1_$$_DOTEST
leaq -56(%rsp),%rsp
.Lc8:
.seh_stackalloc 56
.seh_endprologue
movq $1,40(%rsp)
movl $0,32(%rsp)
cmpw $0,40(%rsp)
jne .Lj9
movl $1,%ecx
call SYSTEM_$$_HALT$LONGINT
.Lj9:
movq $0,40(%rsp)
movl $2,32(%rsp)
cmpw $0,40(%rsp)
je .Lj12
movl $2,%ecx
call SYSTEM_$$_HALT$LONGINT
.Lj12:
...
There's still lots of room for improvement though. A peephole optimisation can possibly detect the deterministic CMPW
instructions against 40(%rsp)
, but more node-level optimisations should help too, since the use of the stack is partially mandated by the use of internal temprefs
(which is also why there is an unused write to 32(%rsp)
, corresponding to the Code
output of Val
) which could be optimised out.