Skip to content

[x86 / Bug Fix] Upgrades to CMOV optimisations (Fixes #40307)

J. Gareth "Kit" Moreton requested to merge CuriousKit/optimisations:i40307 into main

Summary

This merge request fixes a bug in the OptPass2Jcc routine where the wrong-sized registers were sometimes loaded into the operands, causing assembler errors in some configurations.

This request contains three distinct commits:

  • The first commit is a pure bug fix to get the job done, and is not too dissimilar from the patch example given by @rasberryrabbit in #40307 (closed).
  • The second commit adds some new features to the optimisation in an attempt to reduce register usage and produce more optimal code. There's still room for improvement at a later date.
  • The third commit is a test that targets the original CMOV bug and also showcases the improvements in the second commit.

System

  • Processor architecture: i386, x86_64

What is the current bug behavior?

In some rare situations, a peephole optimisation will load the wrong-sized register into an operand, causing external assembler errors.

What is the behavior after applying this patch?

Such CMOV optimisations will now always load the correct sized register.

Relevant logs and/or screenshots

Due to its rarity, no changes appear in the RTL or compiler source. However, the assembly dump of tests\webtbs\tw40307.pp showcases the situation nicely (this is shown under x86_64-win64... it's harder to showcase on i386 because of the limited number of registers).

Trunk:

.section .text.n_p$tw40307_$$_setparams$boolean$toutstruct,"ax"
	.balign 16,0x90
.globl	P$TW40307_$$_SETPARAMS$BOOLEAN$TOUTSTRUCT
P$TW40307_$$_SETPARAMS$BOOLEAN$TOUTSTRUCT:
.Lc2:
	movw	$16,%r10w
	movw	$32,%ax
	testb	%cl,%cl
	movl	$8,%ecx
	cmovnew	%r10w,%ax
	cmovnel	%r10w,%ecx ; // <-- Will cause an assembler error because we're using a word-sized register in a DWORD-sized operation
	movw	%ax,(%rdx)
	movl	%ecx,4(%rdx)
.Lc3:
	ret
.Lc1:

First commit - code now compiles correctly:

.section .text.n_p$tw40307_$$_setparams$boolean$toutstruct,"ax"
	.balign 16,0x90
.globl	P$TW40307_$$_SETPARAMS$BOOLEAN$TOUTSTRUCT
P$TW40307_$$_SETPARAMS$BOOLEAN$TOUTSTRUCT:
.Lc2:
	movw	$16,%r10w
	movl	$16,%r11d ; // <-- A different DWORD-sized register is now used
	movw	$32,%ax
	testb	%cl,%cl
	movl	$8,%ecx
	cmovnew	%r10w,%ax
	cmovnel	%r11d,%ecx ; // <-- Register is correct size
	movw	%ax,(%rdx)
	movl	%ecx,4(%rdx)
.Lc3:
	ret
.Lc1:

Second commit - optimises size and register usage:

.section .text.n_p$tw40307_$$_setparams$boolean$toutstruct,"ax"
	.balign 16,0x90
.globl	P$TW40307_$$_SETPARAMS$BOOLEAN$TOUTSTRUCT
P$TW40307_$$_SETPARAMS$BOOLEAN$TOUTSTRUCT:
.Lc2:
	movl	$16,%r10d ; // <-- %r10d can be successfully used for both the word-sized and DWORD-sized CMOV instructions
	movw	$32,%ax
	testb	%cl,%cl
	movl	$8,%ecx
	cmovnew	%r10w,%ax
	cmovnel	%r10d,%ecx ; // <-- Register is correct size
	movw	%ax,(%rdx)
	movl	%ecx,4(%rdx)
.Lc3:
	ret
.Lc1:

Additional Notes

i386-win32 and x86_64-win64 pass without incident or regression.

Merge request reports