Skip to content

[x86] Bug fix to XMM block move optimisations when regarding overlapping memory

J. Gareth "Kit" Moreton requested to merge CuriousKit/optimisations:i39627 into main

Summary

This merge request fixes a bug with the XMM block move optimisation (!48 (closed)) in the peephole optimizer in that it doesn't account for the fact that the source and destination memory might overlap. This fixes issue #39627 (closed).

This is done by introducing a new RefsMightOverlap function and using it in the relevant optimisation routine.

System

  • Processor architecture: x86_64, i386 (when SSE is enabled)

What is the current bug behavior?

When faced with a memory block such as the following:

	movq	-64(%rbp),%rax
	movq	%rax,-72(%rbp)
	movq	-72(%rbp),%rax
	movq	%rax,-80(%rbp)

This is erroneously changed to:

	movdqu	-72(%rbp),%xmm0
	movdqa	%xmm0,-80(%rbp)

In this example, this causes a different value to end up in -72(%rbp) due to the overlap.

What is the behavior after applying this patch?

Such code is no longer 'optimised'.

Relevant logs and/or screenshots

See What is the current bug behaviour above and issue #39627 (closed).

Additional notes

Unfortunately this removes a lot of individual optimisations from the compiler and the RTL, but some of these are potentially unsafe anyway (e.g. read from %rcx, write to %rdx, but the actual parameters that these registers are set to are almost the same pointer, differing by only a few bytes). RefsMightOverlap could be improved at a later date to, say, treat %rsp and %rbp differently since their values are often close to each other and to permit certain combinations if the offsets are far enough apart.

Merge request reports