[x86 / Bug Fix] Upgrades to CMOV optimisations (Fixes #40307)
Summary
This merge request fixes a bug in the OptPass2Jcc
routine where the wrong-sized registers were sometimes loaded into the operands, causing assembler errors in some configurations.
This request contains three distinct commits:
- The first commit is a pure bug fix to get the job done, and is not too dissimilar from the patch example given by @rasberryrabbit in #40307 (closed).
- The second commit adds some new features to the optimisation in an attempt to reduce register usage and produce more optimal code. There's still room for improvement at a later date.
- The third commit is a test that targets the original
CMOV
bug and also showcases the improvements in the second commit.
System
- Processor architecture: i386, x86_64
What is the current bug behavior?
In some rare situations, a peephole optimisation will load the wrong-sized register into an operand, causing external assembler errors.
What is the behavior after applying this patch?
Such CMOV
optimisations will now always load the correct sized register.
Relevant logs and/or screenshots
Due to its rarity, no changes appear in the RTL or compiler source. However, the assembly dump of tests\webtbs\tw40307.pp
showcases the situation nicely (this is shown under x86_64-win64... it's harder to showcase on i386 because of the limited number of registers).
Trunk:
.section .text.n_p$tw40307_$$_setparams$boolean$toutstruct,"ax"
.balign 16,0x90
.globl P$TW40307_$$_SETPARAMS$BOOLEAN$TOUTSTRUCT
P$TW40307_$$_SETPARAMS$BOOLEAN$TOUTSTRUCT:
.Lc2:
movw $16,%r10w
movw $32,%ax
testb %cl,%cl
movl $8,%ecx
cmovnew %r10w,%ax
cmovnel %r10w,%ecx ; // <-- Will cause an assembler error because we're using a word-sized register in a DWORD-sized operation
movw %ax,(%rdx)
movl %ecx,4(%rdx)
.Lc3:
ret
.Lc1:
First commit - code now compiles correctly:
.section .text.n_p$tw40307_$$_setparams$boolean$toutstruct,"ax"
.balign 16,0x90
.globl P$TW40307_$$_SETPARAMS$BOOLEAN$TOUTSTRUCT
P$TW40307_$$_SETPARAMS$BOOLEAN$TOUTSTRUCT:
.Lc2:
movw $16,%r10w
movl $16,%r11d ; // <-- A different DWORD-sized register is now used
movw $32,%ax
testb %cl,%cl
movl $8,%ecx
cmovnew %r10w,%ax
cmovnel %r11d,%ecx ; // <-- Register is correct size
movw %ax,(%rdx)
movl %ecx,4(%rdx)
.Lc3:
ret
.Lc1:
Second commit - optimises size and register usage:
.section .text.n_p$tw40307_$$_setparams$boolean$toutstruct,"ax"
.balign 16,0x90
.globl P$TW40307_$$_SETPARAMS$BOOLEAN$TOUTSTRUCT
P$TW40307_$$_SETPARAMS$BOOLEAN$TOUTSTRUCT:
.Lc2:
movl $16,%r10d ; // <-- %r10d can be successfully used for both the word-sized and DWORD-sized CMOV instructions
movw $32,%ax
testb %cl,%cl
movl $8,%ecx
cmovnew %r10w,%ax
cmovnel %r10d,%ecx ; // <-- Register is correct size
movw %ax,(%rdx)
movl %ecx,4(%rdx)
.Lc3:
ret
.Lc1:
Additional Notes
i386-win32
and x86_64-win64
pass without incident or regression.