Skip to content

[x86] Unnecessary CMP and TEST instructions stripped

Summary

This merge request analyses the jumps that appear after CMP and TEST instructions a little bit closer to see if, when all is optimised, the result is a zero distance jump which is subsequently removed, as this means the original CMP/TEST instruction is unnecessary.

Additionally, the Cmp1Jl2Cmp0Jle optimisation now catches the inverse case (the need for this optimisation became more prevalent due to the above optimisation).

Part refector, part bug fix, the CollapseZeroDistJump method no longer deletes the label if it becomes dead or updates the register tracking. The latter point is so the function can be called safely on a tai entry that isn't the current instruction, and also removes awkward side-effect behaviour.

System

  • Processor architecture: i386, x86_64

What is the current bug behavior?

N/A

What is the behavior after applying this patch?

  • Unnecessary CMP and TEST instructions are removed.
  • TEST instructions are merged more frequently (slightly emergent behaviour(!) but it's due to the jumps being optimised in a different order)

Additional notes.

  • The CollapseZeroDistJump refactor was necessary to prevent phantom problems occurring due to the register tracking being updated and labels being removed at inconvenient times.
  • Some jumps may change from jnb to jnc or jle to jng due to, again, the jumps being optimised in a different order. These changes are harmless though because they ultimately compile into the same machine code (the same flags are checked).

Relevant logs and/or screenshots

All logs under -O4, x86_64-win64.

For the simple example of removing a TEST instruction - in aasmdata - before:

.section .text.n_aasmdata$_$tasmdata_$__$$_defineasmsymbolbyclassbase$hfnakgxhwz_o,"ax"
	...
.Lj104:
	movb	%r12b,45(%r14)
	cmpb	44(%r14),%dil
	je	.Lj109
	movq	%r14,%rcx
	call	AASMBASE$_$TASMSYMBOL_$__$$_GETREFS$$LONGINT
	testl	%eax,%eax ; <- Completely unnecessary TEST call because the results of the flags don't get used
.Lj109:
	movb	%dil,44(%r14)
	jmp	.Lj111
	...

After:

.section .text.n_aasmdata$_$tasmdata_$__$$_defineasmsymbolbyclassbase$hfnakgxhwz_o,"ax"
	...
.Lj104:
	movb	%r12b,45(%r14)
	cmpb	44(%r14),%dil
	je	.Lj109
	movq	%r14,%rcx
	call	AASMBASE$_$TASMSYMBOL_$__$$_GETREFS$$LONGINT
.Lj109:
	movb	%dil,44(%r14)
	jmp	.Lj111
	...

In the aasmcpu unit, thanks to a label not being marked as dead, a conditional branch is able to skip further ahead and avoid a 2nd conditional check that's deterministic - before:

.globl	AASMCPU_$$_OPTIMIZE_REF$TREFERENCE$BOOLEAN
	...
	movb	%dl,%sil
	...
	testb	%dl,%dl
	je	.Lj953
	call	AASMCPU_$$_GET_DEFAULT_SEGMENT_OF_REF$TREFERENCE$$TREGISTER
	cmpl	32(%rbx),%eax
	je	.Lj951
.Lj953:
	testb	%sil,%sil ; (%dl = %sil if we arrived here from je .Lj953, so jne .Lj950 will never branch)
	jne	.Lj950
	movq	%rbx,%rcx
	call	AASMCPU_$$_GET_DEFAULT_SEGMENT_OF_REF$TREFERENCE$$TREGISTER
	...

After:

.globl	AASMCPU_$$_OPTIMIZE_REF$TREFERENCE$BOOLEAN
	...
	movb	%dl,%sil
	...
	testb	%dl,%dl
	je	.Lj956
	call	AASMCPU_$$_GET_DEFAULT_SEGMENT_OF_REF$TREFERENCE$$TREGISTER
	cmpl	32(%rbx),%eax
	je	.Lj951
	testb	%sil,%sil
	jne	.Lj950
.Lj956:
	movq	%rbx,%rcx
	call	AASMCPU_$$_GET_DEFAULT_SEGMENT_OF_REF$TREFERENCE$$TREGISTER
	...

In aoptx86, many TEST instructions are merged and some small branches are removed - before:

.globl	AOPTX86$_$TX86ASMOPTIMIZER_$__$$_REGREADBYINSTRUCTION$TREGISTER$TAI$$BOOLEAN
	...
	jmp	.Lj303
	.balign 16,0x90
.Lj463:
	testb	$8,9(%r12)
	jne	.Lj501
	testb	$32,9(%r12)
	setneb	%dil
	jmp	.Lj303
.Lj501:
	movb	$1,%dil
	jmp	.Lj303
	.balign 16,0x90
.Lj464:
	testb	$32,8(%r12)
	jne	.Lj505
	testb	$8,9(%r12)
	jne	.Lj505
	testb	$32,9(%r12)
	setneb	%dil
	jmp	.Lj303
.Lj505:
	...

After:

.globl	AOPTX86$_$TX86ASMOPTIMIZER_$__$$_REGREADBYINSTRUCTION$TREGISTER$TAI$$BOOLEAN
	...
	jmp	.Lj303
	.balign 16,0x90
.Lj463:
	testb	$40,9(%r12)
	setneb	%dil
	jmp	.Lj303
	.balign 16,0x90
.Lj464:
	testb	$32,8(%r12)
	jne	.Lj505
	testb	$40,9(%r12)
	setneb	%dil
	jmp	.Lj303
.Lj505:
	...

This is also an interesting case of possible future optimisation because the last 3 instructions of the .Lj464 branch are identical to the .Lj463 branch and could theoretically be merged (although there may be some minor performance issues with unaligned jumps) - i.e:

.globl	AOPTX86$_$TX86ASMOPTIMIZER_$__$$_REGREADBYINSTRUCTION$TREGISTER$TAI$$BOOLEAN
	...
	jmp	.Lj303
	.balign 16,0x90
.Lj464:
	testb	$32,8(%r12)
	jne	.Lj505
.Lj463:
	testb	$40,9(%r12)
	setneb	%dil
	jmp	.Lj303
.Lj505:
	...
Edited by J. Gareth "Kit" Moreton

Merge request reports