[x86] New "Fast LEA" hint and new "ICELAKE" processor option
Summary
To facilitate more accurate optimisation involving LEA instructions, two new optimisation hints have been introduced to the compiler, and a new processor option.
-
CPUX86_HINT_FAST_3COMP_ADDR- this indicates that complexLEAinstructions (commonly those that have a base, index and offset) are at least as fast as twoADDinstructions in a dependency chain (LEAhas a latency of at least 3 cycles on many Intel processors from the 2000s and 2010s). - New processor options
ICELAKE,ICELAKE-CLIENT(synonym forICELAKEfor GCC compatibility) andICELAKE-SERVERfor Ice Lake architectures. - New benchmark test at
tests/bench/blea.ppwhereLEAandADDtimings can be tested on a custom-made checksum routine.
System
- Processor architecture: i386, x86_64
What is the current bug behavior?
N/A
What is the behavior after applying this patch?
Flags currently not used, but its use is planned for !134 and !501 (merged).
Additional notes
From user tests and Agner Fog's reports, the flags are assigned as follows to the available processors
Processor Fast (32/64)
----------- ------------
80386 *
80486 *
PENTIUM *
PENTIUM2 *
PENTIUM3 *
PENTIUM4
PENTIUMM *
ATHLON *
COREI *
BOBCAT *
COREAVX
JAGUAR *
PILEDRIVER *
EXCAVATOR *
COREAVX2
ZEN *
ZEN2 *
ICELAKE *
ICELAKE-CLIENT *
ICELAKE-SERVER *
ZEN3 *
The Pentium 4 was a bit of an odd case; early versions had good 16-bit speed but not 32-bit speed, while Prescott Pentium 4s had poor latency all round. All Intel processors starting from Sandy Bridge (COREAVX) had poor LEA latency but this was finally addressed in Ice Lake (ICELAKE). AMD processors generally had good 32/64-bit performance through all iterations but at the cost of 16-bit performance.
Edited by J. Gareth "Kit" Moreton