[x86 / Bug Fix] Corrected SSE/AVX peephole optimisations under -O3 (Fixes #40401)
Summary
This merge request fixes an error in the peephole optimizer where, due to the behaviour of GetNextInstructionUsingReg
under -O3
and above, some SSE/AVX peephole optimisations were applied incorrectly if the original source register was modified between the triggering instruction and the one that got erroneously removed in the situation where the source and final destination registers are identical.
This merge request resolves issue #40401 (closed).
Additionally, some register tracking was omitted and this caused a subtle failure elsewhere. A separate commit has been included to fix this too.
System
- Processor architecture: x86_64 (also i386 with special settings)
What is the current bug behavior?
Depending on how registers are assigned, the peephole optimizer sometimes makes incorrect optimisations on SSE and AVX instructions.
What is the behavior after applying this patch?
Only correct optimisations are now made where SSE and AVX are concerned.
Additional Notes
New test tests/webtbs/tw40401.pp
evaluates this particular failure.
Relevant logs and/or screenshots
In the int
unit in packages\numlib\src
, a MOVAPD
instruction is incorrectly removed:
.Lj287:
...
movapd %xmm0,%xmm15
movsd -120(%rbp),%xmm0
subsd %xmm8,%xmm0
mulsd -144(%rbp),%xmm0
divsd %xmm8,%xmm0
addsd -136(%rbp),%xmm0
movapd %xmm0,%xmm12
movapd %xmm15,%xmm0 ; // <-- Gets incorrectly stripped out in the trunk. %xmm0 is modified between this instruction and "movapd %xmm0,%xmm15" above
movq %rbx,%rax
...
Similar problems occur in a few other units, but no examples appear in the compiler or RTL.
For the commit that fixes the allocation problem, this fixes some issues where parameters are not preserved - with the fix:
...
.section .text.n_fppdf$_$tpdfpage_$__$$_drawrect$tpdfcoord$single$single$single$boolean$boolean$single,"ax"
.balign 16,0x90
.globl FPPDF$_$TPDFPAGE_$__$$_DRAWRECT$TPDFCOORD$SINGLE$SINGLE$SINGLE$BOOLEAN$BOOLEAN$SINGLE
FPPDF$_$TPDFPAGE_$__$$_DRAWRECT$TPDFCOORD$SINGLE$SINGLE$SINGLE$BOOLEAN$BOOLEAN$SINGLE:
.Lc705:
.seh_proc FPPDF$_$TPDFPAGE_$__$$_DRAWRECT$TPDFCOORD$SINGLE$SINGLE$SINGLE$BOOLEAN$BOOLEAN$SINGLE
pushq %rbp
.seh_pushreg %rbp
.Lc706:
movq %rsp,%rbp
.Lc707:
leaq -96(%rsp),%rsp
.seh_stackalloc 96
.seh_endprologue
movaps %xmm2,%xmm3
movss 72(%rbp),%xmm1
movss %xmm1,64(%rsp)
movb 64(%rbp),%al
movb %al,56(%rsp)
movb 56(%rbp),%al
movb %al,48(%rsp)
movss 48(%rbp),%xmm1
movss %xmm1,40(%rsp)
movss %xmm3,32(%rsp)
movss 4(%rdx),%xmm2
movss (%rdx),%xmm1
call FPPDF$_$TPDFPAGE_$__$$_DRAWRECT$SINGLE$SINGLE$SINGLE$SINGLE$SINGLE$BOOLEAN$BOOLEAN$SINGLE
leaq (%rbp),%rsp
popq %rbp
ret
.seh_endproc
...
Without it, %xmm3
is erroneously stripped out and DrawRect
is called with an undefined parameter:
...
.section .text.n_fppdf$_$tpdfpage_$__$$_drawrect$tpdfcoord$single$single$single$boolean$boolean$single,"ax"
.balign 16,0x90
.globl FPPDF$_$TPDFPAGE_$__$$_DRAWRECT$TPDFCOORD$SINGLE$SINGLE$SINGLE$BOOLEAN$BOOLEAN$SINGLE
FPPDF$_$TPDFPAGE_$__$$_DRAWRECT$TPDFCOORD$SINGLE$SINGLE$SINGLE$BOOLEAN$BOOLEAN$SINGLE:
.Lc705:
.seh_proc FPPDF$_$TPDFPAGE_$__$$_DRAWRECT$TPDFCOORD$SINGLE$SINGLE$SINGLE$BOOLEAN$BOOLEAN$SINGLE
pushq %rbp
.seh_pushreg %rbp
.Lc706:
movq %rsp,%rbp
.Lc707:
leaq -96(%rsp),%rsp
.seh_stackalloc 96
.seh_endprologue
movss %xmm2,32(%rsp)
movss 72(%rbp),%xmm1
movss %xmm1,64(%rsp)
movb 64(%rbp),%al
movb %al,56(%rsp)
movb 56(%rbp),%al
movb %al,48(%rsp)
movss 48(%rbp),%xmm1
movss %xmm1,40(%rsp)
movss 4(%rdx),%xmm2
movss (%rdx),%xmm1
call FPPDF$_$TPDFPAGE_$__$$_DRAWRECT$SINGLE$SINGLE$SINGLE$SINGLE$SINGLE$BOOLEAN$BOOLEAN$SINGLE
leaq (%rbp),%rsp
popq %rbp
ret
.seh_endproc
...