Skip to content

[x86] Streamlining of x86's OptPass1LEA routine

Summary

This merge request was originally just going to be a refactor of OptPass1LEA, since the stack pointer is now properly tracked. However there became room for overall improvements in generated code with very little loss.

The second commit is a refactor of the streamlining that helps OptPass1LEA to avoid calling expensive routines like RegModifiedBetween when it isn't necessary (i.e. when p and hp1 are adjacent).

System

  • Processor architecture: i386, x86_64

What is the current bug behavior?

N/A

What is the behavior after applying this patch?

OptPass1LEA should see some speed gains under -O1 and -O2 as well as some additional optimisations; -O4 sees some too.

Tests on i386-win32 and x86_64-win64 show no regressions.

Relevant logs and/or screenshots

In the System unit under -O4, an inefficiency was fixed - before:

	...
.Lj3148:
	leaq	(%rbx,%rbx,1),%rax
	leaq	6(%rax),%rdx
	leaq	32(%rsp),%rcx
	call	SYSTEM_$$_GETMEM$POINTER$QWORD
	...

After:

	...
.Lj3148:
	leaq	6(%rbx,%rbx,1),%rdx
	leaq	32(%rsp),%rcx
	call	SYSTEM_$$_GETMEM$POINTER$QWORD
        ...

These little lea/lea optimisations are quite common. There are also some unexpected improvements - for example, before:

.Lj7043:
        ...
	shlq	$1,%rax
	leaq	32(%rax),%rcx
        ...

After:

.Lj7043:
        ...
	leaq	32(%rax,%rax,1),%rcx
        ...

Under -O2, the System unit gains an optimisation with two instructions that have a third instruction between them - before:

.Lj6565:
	movq	32(%rsp),%rax
	leaq	1(%rsi),%rdx
	movslq	%edi,%rdi
	addq	%rdi,%rdx
	movq	$0,(%rax,%rdx,8)
        ...

After:

.Lj6565:
	movq	32(%rsp),%rax
	movslq	%edi,%rdi
	leaq	1(%rsi,%rdi),%rdx ; <-- The LEA and the ADD (it might have been a LEA at this point) are merged
	movq	$0,(%rax,%rdx,8)
        ...
Edited by J. Gareth "Kit" Moreton

Merge request reports