Skip to content

[x86 / Bug Fix] Corrected SSE/AVX peephole optimisations under -O3 (Fixes #40401)

J. Gareth "Kit" Moreton requested to merge CuriousKit/optimisations:i40401 into main

Summary

This merge request fixes an error in the peephole optimizer where, due to the behaviour of GetNextInstructionUsingReg under -O3 and above, some SSE/AVX peephole optimisations were applied incorrectly if the original source register was modified between the triggering instruction and the one that got erroneously removed in the situation where the source and final destination registers are identical.

This merge request resolves issue #40401 (closed).

Additionally, some register tracking was omitted and this caused a subtle failure elsewhere. A separate commit has been included to fix this too.

System

  • Processor architecture: x86_64 (also i386 with special settings)

What is the current bug behavior?

Depending on how registers are assigned, the peephole optimizer sometimes makes incorrect optimisations on SSE and AVX instructions.

What is the behavior after applying this patch?

Only correct optimisations are now made where SSE and AVX are concerned.

Additional Notes

New test tests/webtbs/tw40401.pp evaluates this particular failure.

Relevant logs and/or screenshots

In the int unit in packages\numlib\src, a MOVAPD instruction is incorrectly removed:

.Lj287:
	...
	movapd	%xmm0,%xmm15
	movsd	-120(%rbp),%xmm0
	subsd	%xmm8,%xmm0
	mulsd	-144(%rbp),%xmm0
	divsd	%xmm8,%xmm0
	addsd	-136(%rbp),%xmm0
	movapd	%xmm0,%xmm12
	movapd	%xmm15,%xmm0 ; // <-- Gets incorrectly stripped out in the trunk.  %xmm0 is modified between this instruction and "movapd %xmm0,%xmm15" above
	movq	%rbx,%rax
	...

Similar problems occur in a few other units, but no examples appear in the compiler or RTL.

For the commit that fixes the allocation problem, this fixes some issues where parameters are not preserved - with the fix:

	...
.section .text.n_fppdf$_$tpdfpage_$__$$_drawrect$tpdfcoord$single$single$single$boolean$boolean$single,"ax"
	.balign 16,0x90
.globl	FPPDF$_$TPDFPAGE_$__$$_DRAWRECT$TPDFCOORD$SINGLE$SINGLE$SINGLE$BOOLEAN$BOOLEAN$SINGLE
FPPDF$_$TPDFPAGE_$__$$_DRAWRECT$TPDFCOORD$SINGLE$SINGLE$SINGLE$BOOLEAN$BOOLEAN$SINGLE:
.Lc705:
.seh_proc FPPDF$_$TPDFPAGE_$__$$_DRAWRECT$TPDFCOORD$SINGLE$SINGLE$SINGLE$BOOLEAN$BOOLEAN$SINGLE
	pushq	%rbp
.seh_pushreg %rbp
.Lc706:
	movq	%rsp,%rbp
.Lc707:
	leaq	-96(%rsp),%rsp
.seh_stackalloc 96
.seh_endprologue
	movaps	%xmm2,%xmm3
	movss	72(%rbp),%xmm1
	movss	%xmm1,64(%rsp)
	movb	64(%rbp),%al
	movb	%al,56(%rsp)
	movb	56(%rbp),%al
	movb	%al,48(%rsp)
	movss	48(%rbp),%xmm1
	movss	%xmm1,40(%rsp)
	movss	%xmm3,32(%rsp)
	movss	4(%rdx),%xmm2
	movss	(%rdx),%xmm1
	call	FPPDF$_$TPDFPAGE_$__$$_DRAWRECT$SINGLE$SINGLE$SINGLE$SINGLE$SINGLE$BOOLEAN$BOOLEAN$SINGLE
	leaq	(%rbp),%rsp
	popq	%rbp
	ret
.seh_endproc
	...

Without it, %xmm3 is erroneously stripped out and DrawRect is called with an undefined parameter:

	...
.section .text.n_fppdf$_$tpdfpage_$__$$_drawrect$tpdfcoord$single$single$single$boolean$boolean$single,"ax"
	.balign 16,0x90
.globl	FPPDF$_$TPDFPAGE_$__$$_DRAWRECT$TPDFCOORD$SINGLE$SINGLE$SINGLE$BOOLEAN$BOOLEAN$SINGLE
FPPDF$_$TPDFPAGE_$__$$_DRAWRECT$TPDFCOORD$SINGLE$SINGLE$SINGLE$BOOLEAN$BOOLEAN$SINGLE:
.Lc705:
.seh_proc FPPDF$_$TPDFPAGE_$__$$_DRAWRECT$TPDFCOORD$SINGLE$SINGLE$SINGLE$BOOLEAN$BOOLEAN$SINGLE
	pushq	%rbp
.seh_pushreg %rbp
.Lc706:
	movq	%rsp,%rbp
.Lc707:
	leaq	-96(%rsp),%rsp
.seh_stackalloc 96
.seh_endprologue
	movss	%xmm2,32(%rsp)
	movss	72(%rbp),%xmm1
	movss	%xmm1,64(%rsp)
	movb	64(%rbp),%al
	movb	%al,56(%rsp)
	movb	56(%rbp),%al
	movb	%al,48(%rsp)
	movss	48(%rbp),%xmm1
	movss	%xmm1,40(%rsp)
	movss	4(%rdx),%xmm2
	movss	(%rdx),%xmm1
	call	FPPDF$_$TPDFPAGE_$__$$_DRAWRECT$SINGLE$SINGLE$SINGLE$SINGLE$SINGLE$BOOLEAN$BOOLEAN$SINGLE
	leaq	(%rbp),%rsp
	popq	%rbp
	ret
.seh_endproc
	...

Merge request reports