Skip to content

[Cross-platform] The "Val" intrinsic is now simplified for string constants

Summary

This merge request implements node-level simplification of Val instructions if the inputs are deterministic.

System

  • Processor architecture: All

What is the current bug behavior?

N/A

What is the behavior after applying this patch?

Val instructions that take a string constnat are now simplified so it doesn't get called. Essentially like what happens with Str instructions in !346 (merged).

Relevant logs and/or screenshots

No examples appear in the compiler, RTL or packages, so a new test, test/cg/tval1.pp, was added to showcase the feature. On the trunk:

	...
.section .text.n_p$tval1_$$_dotest,"ax"
	.balign 16,0x90
.globl	P$TVAL1_$$_DOTEST
P$TVAL1_$$_DOTEST:
.Lc7:
.seh_proc P$TVAL1_$$_DOTEST
	leaq	-56(%rsp),%rsp
.Lc8:
.seh_stackalloc 56
.seh_endprologue
	leaq	40(%rsp),%r8
	leaq	_$TVAL1$_Ld1(%rip),%rdx
	movl	$4,%ecx
	call	fpc_val_sint_shortstr
	movl	%eax,32(%rsp)
	cmpw	$0,40(%rsp)
	jne	.Lj9
	movl	$1,%ecx
	call	SYSTEM_$$_HALT$LONGINT
.Lj9:
	leaq	40(%rsp),%r8
	leaq	_$TVAL1$_Ld2(%rip),%rdx
	movl	$4,%ecx
	call	fpc_val_sint_shortstr
	movl	%eax,32(%rsp)
	cmpw	$0,40(%rsp)
	je	.Lj12
	movl	$2,%ecx
	call	SYSTEM_$$_HALT$LONGINT
.Lj12:
	...

(_$TVAL1$_Ld1 and _$TVAL1$_Ld2 refer to string constants)

With this improvement under x86_64-win64:

	...
.section .text.n_p$tval1_$$_dotest,"ax"
	.balign 16,0x90
.globl	P$TVAL1_$$_DOTEST
P$TVAL1_$$_DOTEST:
.Lc7:
.seh_proc P$TVAL1_$$_DOTEST
	leaq	-56(%rsp),%rsp
.Lc8:
.seh_stackalloc 56
.seh_endprologue
	movq	$1,40(%rsp)
	movl	$0,32(%rsp)
	cmpw	$0,40(%rsp)
	jne	.Lj9
	movl	$1,%ecx
	call	SYSTEM_$$_HALT$LONGINT
.Lj9:
	movq	$0,40(%rsp)
	movl	$2,32(%rsp)
	cmpw	$0,40(%rsp)
	je	.Lj12
	movl	$2,%ecx
	call	SYSTEM_$$_HALT$LONGINT
.Lj12:
	...

There's still lots of room for improvement though. A peephole optimisation can possibly detect the deterministic CMPW instructions against 40(%rsp), but more node-level optimisations should help too, since the use of the stack is partially mandated by the use of internal temprefs (which is also why there is an unused write to 32(%rsp), corresponding to the Code output of Val) which could be optimised out.

Merge request reports