[x86] Fixed inefficiency in var9 optimisation under -Os (!296) · Merge requests · FPC / FPC / FPC Source

J. Gareth "Kit" Moreton requested to merge CuriousKit/optimisations:movzx-and-Os-fix into main Oct 13, 2022

Summary

This merge request fixes an inefficiency in the "var9" optimisation (converts movzbl %regb,%regl to andl $255,%regl) in OptPass1Movx. Normally this is nor performed under -Os, but if the register is %eax, it still permitted it on account that the machine code size is smaller than a MOVZX instruction. This is not correct because the AND instructions that take a register and an 8-bit immediate are only valid if the immediate is sign-extended (i.e. is between -128 and 127). As a result, the "255" in andl $255,%eax is stored as a 32-bit integer, this the entire instruction takes 5 bytes.

System

Processor architecture: i386. x86_64

What is the current bug behavior?

Under -Os, movzbl %al,%eax gets changed to andl $255,%eax, which is both a bigger instruction in terms of its machine code and also risks a partial write penalty.

What is the behavior after applying this patch?

Under -Os, movzbl %al,%eax, as well as any other register combinations, will remain as is.

Additional notes

Under -Os, movzbw %al,%ax will still get converted to andw $255,%ax, as the instruction is only 3 bytes in size. While this is the same size as a MOVZX instruction, it usually has more optimisation potential.

[x86] Fixed inefficiency in var9 optimisation under -Os

Summary

System

What is the current bug behavior?

What is the behavior after applying this patch?

Additional notes

Merge request reports