[x86_64 / Bug Fix] Fixed incorrect truncation due to CMOV
Summary
This merge request fixes a problem that occurs in the x86_64 peephole optimizer where a quirk with 32-bit CMOV instructions will zero the upper 32 bits of a 64-bit register even if the condition is false. A combination of peephole optimisations could result in an incorrectly-truncated 64-bit value.
One particular optimisation, MovlMovq2MovlMovl 1, attempted to make more efficient code by noting that the upper 32 bits of the destination register in the Movq instruction would always be zero. Unfortunately, by doing so, the peephole optimizer loses valuable information indicating that the full 64-bit register is actually in use, so if this MOV pair is optimised into a single MOV, happens to be within a conditional block and is thus converted to a CMOV, the upper 32 bits of said register get overwritten with zeroes even if the condition is false. The solution was to move the MovlMovq2MovlMovl 1 optimisation from pass 1 to pass 2, as this allows the extensive CMOV optimisation code to execute first and have full knowledge of the 64-bit register's usage.
This issue fixes #41317 (closed).
System
- Operating system: Windows (very likely others)
- Processor architecture: x86_64
What is the current bug behavior?
In certain situations, conditional constructs can cause 64-bit integers to get their upper 32 bits zeroed.
What is the behavior after applying this patch?
64-bit values are no longer truncated.
Relevant logs and/or screenshots
In the original test over at #41317 (closed) - before (-O2, x86_64-win64):
.section .text.n_p$bug_$$_test$longword$int64$$int64,"ax"
.balign 16,0x90
.globl P$BUG_$$_TEST$LONGWORD$INT64$$INT64
P$BUG_$$_TEST$LONGWORD$INT64$$INT64:
.Lc2:
testq %rdx,%rdx
# Peephole Optimization: CMOV Block (Simple type)
# Peephole Optimization: MovAnd2Mov 1 done
# Peephole Optimization: Mov2Nop 8 done
# Peephole Optimization: Made 32-to-64-bit zero extension more efficient (MovlMovq2MovlMovl 1)
cmovgl %ecx,%edx # <-- if the condition is false, %rdx is supposed to be passed through unmodified, but instead the upper 32 bits gets zeroed.
movq %rdx,%rax
.Lc3:
ret
.Lc1:
After:
.section .text.n_p$bug_$$_test$longword$int64$$int64,"ax"
.balign 16,0x90
.globl P$BUG_$$_TEST$LONGWORD$INT64$$INT64
P$BUG_$$_TEST$LONGWORD$INT64$$INT64:
.Lc2:
testq %rdx,%rdx
# Peephole Optimization: CMOV Block (Simple type)
# Peephole Optimization: MovAnd2Mov 1 done
cmovgl %ecx,%r8d # <-- The "MovlMovq2MovlMovl 1" and related "Mov2Nop 8" optimisations are no longer performed.
cmovgq %r8,%rdx # <-- Both MOVs become CMOV instructions and only the full %rdx is conditionally written to.
movq %rdx,%rax
.Lc3:
ret
.Lc1:
Additional notes
Optimisation may be slightly weaker after moving MovlMovq2MovlMovl 1 to pass 2, and there is definite room for improvement, but this would fall under the category of an enhancement rather than fixing the bug and will be addressed in a future merge request.