J. Gareth "Kit" Moreton requested to merge CuriousKit/optimisations:mov-mem-bit-shift into main Aug 19, 2022

Summary

This merge request optimises reads from memory where only the upper byte or word (16 bits) are actually needed, converting a MOV/SHR pair into a MOVZX instruction, and a MOV/SAR pair into an MOVSX instruction.

System

Processor architecture: i386, x86_64

What is the current bug behavior?

N/A

What is the behavior after applying this patch?

A large number of optimisations should now occur when reading the most-significant bytes of 32-bit memory blocks, among others.

Relevant logs and/or screenshots

A large number of files receive improvements - here's an example of aasmcpu (x86_64-win64) where a cascade of optimisations occur under -O4... before:

.Lj285:
	...
	# Register eax allocated
	movl	8(%r14),%eax
	shrl	$24,%eax
	andl	$255,%eax
	# Register rflags allocated
	cmpb	$4,%al
	# Register al released

After

.Lj285:
	...
	# Register eax allocated
# Peephole Optimization: movl 8(%r14),%eax; shrl $24,%eax -> movzxbl 11(%r14),%eax (MovShr/Sar2Movx)
# Peephole Optimization: MovzAnd2Movz1
	movzbl	11(%r14),%eax
	# Register rflags allocated
	cmpb	$4,%al
	# Register al released

(The optimisations in OptPass2Movx should be improved in the future to detect that only the lowest byte of %eax is used, thus the MOVZBL instruction can be optimised to mov 11(%r14),%al, which can then be optimised further with the CMP instruction to just cmp $4,11(%r14))

Additional notes

There is a fair bit of crossover with the optimisations performed in !278 (merged).

[x86] New MOV/SHR optimisation when reading from memory