Skip to content

x86: Missed logic with CMP and MOV optimisations

This merge request fixes a few minor issues with peephole optimisations:

  • cmpq (const),%reg instructions don't check to see if any MOV instructions following it can be moved to before it because the "CmpJe2NegJo" optimisation exits the entire procedure if the instruction size is S_Q instead of moving onto the next optimisation (which is to move MOV instructions).
  • The "J(c)Mov0JmpMov1 -> Set(c)" optimisations didn't call AllocRegBetween to properly track the register used by SETcc. There didn't seem to be any adverse effects from this oversight, but incorrect register tracking may cause future problems with optimisations that depend closely on it.
  • "movq $0,%regq" was not optimised to "movl $0,%regl" if the FLAGS register is in use (it gets optimised to xorl %regl,%regl otherwise) because the size reduction optimisation only checks values between 1 and $FFFFFFFF. This generally gets overshadowed by the above fixed CMP/MOV optimisation though. Also, because this optimisation is only specific to x86_64 due to the 64-bit register sizes, the code is now disabled for i386 and i8086 with no loss.

Under -O4 in the System unit for x86_64-win64 - trunk:

.section .text.n_system_$$_pos$shortstring$char$int64$$int64,"ax"
	.balign 16,0x90
.globl	SYSTEM_$$_POS$SHORTSTRING$CHAR$INT64$$INT64
SYSTEM_$$_POS$SHORTSTRING$CHAR$INT64$$INT64:
	cmpb	$1,(%rcx)
	jne	.Lj1460
	cmpb	1(%rcx),%dl
	jne	.Lj1460
	cmpq	$1,%r8
# Peephole Optimization: J(c)Mov1JmpMov0 -> Set(~c) (partial)
	movq	$0,%rax
	seteb	%al
	ret
	.p2align 4,,10
	.p2align 3
.Lj1460:
# Peephole Optimization: xorq %rax,%rax -> xorl %eax,%eax (removes REX prefix)
	xorl	%eax,%eax
	ret

Merge request:

.section .text.n_system_$$_pos$shortstring$char$int64$$int64,"ax"
	.balign 16,0x90
.globl	SYSTEM_$$_POS$SHORTSTRING$CHAR$INT64$$INT64
SYSTEM_$$_POS$SHORTSTRING$CHAR$INT64$$INT64:
	cmpb	$1,(%rcx)
	jne	.Lj1460
	cmpb	1(%rcx),%dl
	jne	.Lj1460
# Peephole Optimization: Swapped cmp and mov instructions to improve optimisation potential
# Peephole Optimization: xorq %rax,%rax -> xorl %eax,%eax (removes REX prefix)
	xorl	%eax,%eax
	cmpq	$1,%r8
# Peephole Optimization: J(c)Mov1JmpMov0 -> Set(~c) (partial)
	seteb	%al
	ret
	.p2align 4,,10
	.p2align 3
.Lj1460:
# Peephole Optimization: xorq %rax,%rax -> xorl %eax,%eax (removes REX prefix)
	xorl	%eax,%eax
	ret

Criteria

Confirm correct compilation, no regressions and slightly improved generated code, especially if a condition is written to a Boolean type that's longer than a byte.

Future Development

Going by the sample procedure alone, there seems to be potential to optimise the conditional jumps by placing xorl %eax,%eax before them, since zeroing that register is common to all the branches.

Merge request reports