[x86] New MOV/SHR optimisation when reading from memory
Summary
This merge request optimises reads from memory where only the upper byte or word (16 bits) are actually needed, converting a MOV/SHR pair into a MOVZX instruction, and a MOV/SAR pair into an MOVSX instruction.
System
- Processor architecture: i386, x86_64
What is the current bug behavior?
N/A
What is the behavior after applying this patch?
A large number of optimisations should now occur when reading the most-significant bytes of 32-bit memory blocks, among others.
Relevant logs and/or screenshots
A large number of files receive improvements - here's an example of aasmcpu (x86_64-win64) where a cascade of optimisations occur under -O4... before:
.Lj285:
...
# Register eax allocated
movl 8(%r14),%eax
shrl $24,%eax
andl $255,%eax
# Register rflags allocated
cmpb $4,%al
# Register al released
After
.Lj285:
...
# Register eax allocated
# Peephole Optimization: movl 8(%r14),%eax; shrl $24,%eax -> movzxbl 11(%r14),%eax (MovShr/Sar2Movx)
# Peephole Optimization: MovzAnd2Movz1
movzbl 11(%r14),%eax
# Register rflags allocated
cmpb $4,%al
# Register al released
(The optimisations in OptPass2Movx should be improved in the future to detect that only the lowest byte of %eax is used, thus the MOVZBL instruction can be optimised to mov 11(%r14),%al, which can then be optimised further with the CMP instruction to just cmp $4,11(%r14))
Additional notes
There is a fair bit of crossover with the optimisations performed in !278 (merged).