Skip to content

[x86] Bug fix for "J(c)Mov1Jmp/RetMov0Jmp/Ret -> Set(~c)Jmp/Ret" optimisation

J. Gareth "Kit" Moreton requested to merge CuriousKit/optimisations:i39922 into main

Summary

This merge request fixes #39922 (closed) where a mov $0,%reg instruction was inserted on the wrong side of a SETcc instruction as part of an optimisation.

System

  • Processor architecture: i386, x86_64

What is the current bug behavior?

In situations where the "J(c)Mov1Jmp/RetMov0Jmp/Ret -> Set(~c)Jmp/Ret" optimisation is applied to a destination register that is larger than 8 bits in size, a mov $0,%reg instruction is generated to zero-extend the result set by the SETcc instruction. However, due to some faulty logic, this instruction was inserted after the SETcc instruction rather than before, thereby nullifying its effect and causing the result to always be zero.

What is the behavior after applying this patch?

  • The mov $0,%reg instruction is now in the correct place.
  • Additionally, under -Os, a MOVZX instruction is favoured instead.

Relevant logs and/or screenshots

For the new tw39922.pp test - before:

	...
# [13] ((AnsiChar(c) >= 'a') and (AnsiChar(c) <= 'z'))
	leal	-97(%ecx),%eax
	# Register ecx released
	# Register rflags allocated
	cmpb	$25,%al
	# Register al released
# Peephole Optimization: J(c)Mov1Jmp/RetMov0Jmp/Ret -> Set(~c)Jmp/Ret
	setbeb	%al
	movl	$0,%eax ; <-- Instruction essentially nullifies the setbeb instruction.
	ret
	# Register rflags released
.Lj5:
	# Register eax allocated
# [15] Result := 1
	movl	$1,%eax
# Peephole Optimization: Converted JMP to RET as part of SETcc optimisation (1st jump)
	ret
	...

After:

	...
# [13] ((AnsiChar(c) >= 'a') and (AnsiChar(c) <= 'z'))
	leal	-97(%ecx),%eax
	# Register ecx released
	# Register rflags allocated
	cmpb	$25,%al
	# Register al released
# Peephole Optimization: J(c)Mov1Jmp/RetMov0Jmp/Ret -> Set(~c)Jmp/Ret
	# Register rax allocated
	movl	$0,%eax
	setbeb	%al
	ret
	# Register rflags,rax released
.Lj5:
	# Register eax allocated
# [15] Result := 1
	movl	$1,%eax
# Peephole Optimization: Converted JMP to RET as part of SETcc optimisation (1st jump)
	ret
	...

For the new tw39922a.pp test that tests the -Os version - before:

	...
# [13] ((AnsiChar(c) >= 'a') and (AnsiChar(c) <= 'z'))
	leal	-97(%ecx),%eax
	# Register ecx released
	# Register rflags allocated
	cmpb	$25,%al
	# Register al released
# Peephole Optimization: J(c)Mov1Jmp/RetMov0Jmp/Ret -> Set(~c)Jmp/Ret
	setbeb	%al
	movl	$0,%eax ; <- MOV instruction in the wrong place.
	ret
	# Register rflags released
.Lj5:
	# Register eax allocated
# [15] Result := 1
	movl	$1,%eax
# Peephole Optimization: Converted JMP to RET as part of SETcc optimisation (1st jump)
	ret
	...

After:

	...
# [13] ((AnsiChar(c) >= 'a') and (AnsiChar(c) <= 'z'))
	leal	-97(%ecx),%eax
	# Register ecx released
	# Register rflags allocated
	cmpb	$25,%al
	# Register al released
# Peephole Optimization: J(c)Mov1Jmp/RetMov0Jmp/Ret -> Set(~c)Jmp/Ret
	# Register rax allocated
	setbeb	%al
# Peephole Optimization: var9
	andl	$255,%eax ; <- MOVZX instruction gets inserted instead, and is then converted into an AND instruction.
	ret
	# Register rflags,rax released
.Lj5:
	# Register eax allocated
# [15] Result := 1
	movl	$1,%eax
# Peephole Optimization: Converted JMP to RET as part of SETcc optimisation (1st jump)
	ret
	...

Additional Notes

There was an oversight where a conditional jump was converted into a RET instruction but its is_jmp property was not reset, which caused the andl $255,%eax instruction to be converted into a testl $255,%eax instruction due to this issue and the flags register still being in use (its deallocation could potentially be moved to right after the SETcc instruction). This was corrected by changing the code to use ConvertJumpToRET like the other jumps in the routine, but a minor addition was also added to this utility method, which was to ensure the condition is set to C_None... this means the method now works correctly for both conditional and unconditional jumps.

Register allocation was also corrected - the target register was not always allocated across the SETcc and supporting instructions.

The -Os optimisation is not as good as it could be due to an inefficiency that is fixed in !296 (merged).

Edited by J. Gareth "Kit" Moreton

Merge request reports