[x86] New "Fast LEA" hint and new "ICELAKE" processor option
Summary
To facilitate more accurate optimisation involving LEA
instructions, two new optimisation hints have been introduced to the compiler, and a new processor option.
-
CPUX86_HINT_FAST_3COMP_ADDR
- this indicates that complexLEA
instructions (commonly those that have a base, index and offset) are at least as fast as twoADD
instructions in a dependency chain (LEA
has a latency of at least 3 cycles on many Intel processors from the 2000s and 2010s). - New processor options
ICELAKE
,ICELAKE-CLIENT
(synonym forICELAKE
for GCC compatibility) andICELAKE-SERVER
for Ice Lake architectures. - New benchmark test at
tests/bench/blea.pp
whereLEA
andADD
timings can be tested on a custom-made checksum routine.
System
- Processor architecture: i386, x86_64
What is the current bug behavior?
N/A
What is the behavior after applying this patch?
Flags currently not used, but its use is planned for !134 and !501 (merged).
Additional notes
From user tests and Agner Fog's reports, the flags are assigned as follows to the available processors
Processor Fast (32/64)
----------- ------------
80386 *
80486 *
PENTIUM *
PENTIUM2 *
PENTIUM3 *
PENTIUM4
PENTIUMM *
ATHLON *
COREI *
BOBCAT *
COREAVX
JAGUAR *
PILEDRIVER *
EXCAVATOR *
COREAVX2
ZEN *
ZEN2 *
ICELAKE *
ICELAKE-CLIENT *
ICELAKE-SERVER *
ZEN3 *
The Pentium 4 was a bit of an odd case; early versions had good 16-bit speed but not 32-bit speed, while Prescott Pentium 4s had poor latency all round. All Intel processors starting from Sandy Bridge (COREAVX
) had poor LEA
latency but this was finally addressed in Ice Lake (ICELAKE
). AMD processors generally had good 32/64-bit performance through all iterations but at the cost of 16-bit performance.
Edited by J. Gareth "Kit" Moreton