Skip to content

[Cross-platform] Inverted integer range node-level optimisation

Summary

This merge request slightly extends the if (v >= x) and (v <= y) then optimisation (where x < y) to also process its logical inverse (courtesy of De Morgan's theorem), if (v < x) or (v > y) then. This construct appears very frequently in code, testing to see if an integer variable falls outside of a valid range and requires very little modification to existing code, and yet produces much more compact code as a result.

This is a node-level optimisation (specifically taddnode.simplify) and therefore it benefits all platforms.

System

  • Processor architecture: Cross-platform

What is the current bug behavior?

N/A

What is the behavior after applying this patch?

Pascal code that checks whether an integer variable falls outside of a given range will now be converted into much more efficient assembly language.

Relevant logs and/or screenshots

The original example was first noted in !612 (merged) and !614 (merged) in the SysUtils unit, where the following Pascal code under x86_64-win64, -O4:

  If (YY<1980) or (YY>2099) then
    Result:=0
  else
    ...

...produces:

.section .text.n_sysutils_$$_datetimetofiledate$tdatetime$$int64,"ax"
	...
	movw	40(%rsp),%dx
	cmpw	$1980,%dx
	setbb	%cl
	cmpw	$2099,%dx
	setab	%dl
	xorl	%eax,%eax
	orb	%dl,%cl
	jne	.Lj11254
	...

When changed to:

  Result:=0;
  If (YY>=1980) and (YY<=2099) then
    ...

It instead produces something far more efficient:

	xorl	%eax,%eax
	movzwl	40(%rsp),%edx
	subl	$1980,%edx
	cmpw	$119,%dx
	jnbe	.Lj11254
	...

Running the original code with this patch, the following is produced:

	movzwl	40(%rsp),%eax
	subl	$1980,%eax
	cmpw	$119,%ax
	movl	$0,%eax
	ja	.Lj11254

This is close to what the modified Pascal code produces (regarding the conditions on the jump, nbe = a). Unfortunately the movl $0,%eax instruction cannot be relcated because %eax is being used as an intermediate register for the conditional check. Nevertheless, it is a massive improvement over the trunk.

An earlier example in SysUtils on x86_64-win64, -O4 - before:

.section .text.n_sysutils_$$_incamonth$word$word$word$longint,"ax"
	...
	cmpl	$11,%eax
	setgb	%cl
	testl	%eax,%eax
	setlb	%dl
	orb	%dl,%cl
	je	.Lj10538
	...

After:

.section .text.n_sysutils_$$_incamonth$word$word$word$longint,"ax"
	...
	cmpl	$11,%eax
	jna	.Lj10538
	...

Outside of SysUtils, a huge number of units receive improvements, include aoptx86 that has grown massively in size over the years due to individual peephole optimisations. One improvement that appears a few times is in LEA optimisations to determine if a calculated offset is within the valid range of a signed 32-bit integer. For example - before:

	...
.Lj3017:
	movq	48(%rsp),%rax
	movq	64(%rax),%rax
	movq	8(%rax),%rax
	movq	(%rax),%rax
	cmpq	$2147483647,%rax
	jg	.Lj3021
	cmpq	$-2147483648,%rax
	jnl	.Lj3023
.Lj3021:
	movq	48(%rsp),%rax
	...

After:

	...
.Lj3017:
	movq	48(%rsp),%rax
	movq	64(%rax),%rax
	movq	8(%rax),%rax
	movq	(%rax),%rax
	subq	$-2147483648,%rax
	movl	$4294967295,%edx
	cmpq	%rdx,%rax
	jna	.Lj3022 ; // <-- this was originally .Lj3023 - label numbering is different.
	movq	48(%rsp),%rax
	...

In the packages, the otherwise tiny zinflate unit receives an improvement - before:

	...
.Lj25:
	cmpw	$8,%si
	setlb	%al
	cmpw	$15,%si
	setgb	%dl
	orb	%dl,%al
	je	.Lj27
	...

After:

	...
.Lj25:
	movzwl	%si,%eax
	subl	$8,%eax
	cmpw	$7,%ax
	jna	.Lj27
	...

Future Development

It's noted that when dealing with word-sized arguments, the final instructions mix 32-bit and 16-bit registers on x86. This is likely due to some inefficiencies in the OptPass2MOV routine that will be investigated at a later date.

Edited by J. Gareth "Kit" Moreton

Merge request reports