[Cross-platform] Sub-register (field access) optimisation
Summary
Following on from some comments in !282 (merged), this merge request updates the a_load_subsetreg_reg
code generation routine to not generate an AND instruction after a SHR instruction if it's unnecessary (i.e. there are no bits to the left that need masking out).
System
- Operating system: All
- Processor architecture: All (although x86 has an extra related peephole optimization)
What is the current bug behavior?
N/A
What is the behavior after applying this patch?
Unnecessary AND
instructions on code relating to reading structure fields are not generated.
Additional notes
A related optimisation in the x86 peephole optimizer was adjusted since the above change caused a single inefficiency in the cgobj
assembly. The fix ended up improving generated code elsewhere. For example, under x86_64-win64, -O3 in the blowfish
unit - before:
.section .text.n_blowfish$_$tblowfish_$__$$_f$longword$$longword,"ax"
.balign 16,0x90
.globl BLOWFISH$_$TBLOWFISH_$__$$_F$LONGWORD$$LONGWORD
BLOWFISH$_$TBLOWFISH_$__$$_F$LONGWORD$$LONGWORD:
.Lc6:
movl %edx,%r8d
andl $255,%r8d
shrl $8,%edx
movl %edx,%r9d
andl $255,%r9d
shrl $8,%edx
movl %edx,%eax
andl $255,%eax
shrl $8,%edx
# Peephole Optimization: AndMovzToAnd done
andl $255,%edx
movzbl %al,%eax
...
After (the peephole optimizer is able to track through all of the SHR instructions and note the final AND
instruction is unnecessary):
.section .text.n_blowfish$_$tblowfish_$__$$_f$longword$$longword,"ax"
.balign 16,0x90
.globl BLOWFISH$_$TBLOWFISH_$__$$_F$LONGWORD$$LONGWORD
BLOWFISH$_$TBLOWFISH_$__$$_F$LONGWORD$$LONGWORD:
.Lc6:
movl %edx,%r8d
andl $255,%r8d
shrl $8,%edx
movl %edx,%r9d
andl $255,%r9d
shrl $8,%edx
movl %edx,%eax
andl $255,%eax
shrl $8,%edx
# Peephole Optimization: AndMovzToAnd done
# Peephole Optimization: Removed AND instruction since previous SHR makes this an identity operation (ShrAnd2Shr)
movzbl %al,%eax
...
In sfpu28, a zero-extension is stripped - before:
.globl SFPU128_$$_MUL32TO64$LONGWORD$LONGWORD$LONGWORD$LONGWORD
SFPU128_$$_MUL32TO64$LONGWORD$LONGWORD$LONGWORD$LONGWORD:
.Lc108:
.seh_proc SFPU128_$$_MUL32TO64$LONGWORD$LONGWORD$LONGWORD$LONGWORD
pushq %rbx
.seh_pushreg %rbx
.Lc109:
.seh_endprologue
movw %cx,%ax
shrl $16,%ecx
movw %dx,%r10w
shrl $16,%edx
movzwl %ax,%r11d
movzwl %r10w,%ebx
imull %ebx,%r11d
movzwl %ax,%eax
movzwl %dx,%ebx
imull %ebx,%eax
movl %ecx,%ebx
movzwl %r10w,%r10d
imull %r10d,%ebx
movzwl %cx,%ecx
movzwl %dx,%edx
imull %edx,%ecx
...
After:
.globl SFPU128_$$_MUL32TO64$LONGWORD$LONGWORD$LONGWORD$LONGWORD
SFPU128_$$_MUL32TO64$LONGWORD$LONGWORD$LONGWORD$LONGWORD:
.Lc108:
.seh_proc SFPU128_$$_MUL32TO64$LONGWORD$LONGWORD$LONGWORD$LONGWORD
pushq %rbx
.seh_pushreg %rbx
.Lc109:
.seh_endprologue
movw %cx,%ax
shrl $16,%ecx
movw %dx,%r10w
shrl $16,%edx
movzwl %ax,%r11d
movzwl %r10w,%ebx
imull %ebx,%r11d
movzwl %ax,%eax
movzwl %dx,%ebx
imull %ebx,%eax
movl %ecx,%ebx
movzwl %r10w,%r10d
imull %r10d,%ebx
; (Instruction "movzwl %cx,%ecx" gets stripped)
movzwl %dx,%edx
imull %edx,%ecx
...