Skip to content

Jump over some LOCK prefixes instead of having LOCK and non-LOCK branches.

Rika requested to merge runewalsh/source:lock into main

First, slightly improve i386.inc:fpc_AnsiStr_Decr_Ref: use scratch edx instead of nonvolatile esi, inline cpudeclocked, do what said in the title, and tail-call FPC_FREEMEM (I’d save one more indirection level with {$ifndef FPC_PIC} jmp MemoryManager.FreeMem {$endif}, but that would require making MemoryManager accessible from there first...).

Second, jump over the LOCK prefixes in x86_64.inc:inclocked/declocked.

You may have a fair amount of doubt: is it okay to jump over the LOCK prefix, essentially into the middle of the prefixed instruction. The short answer is “yes”. I found at least this example where the first line of the disassembly does exactly that:

80ade23:   74 01         je 0x80ade26
80ade25:   f0 0f c1 16   lock xadd %edx,(%esi)

and a mention in Agner Fog’s “Optimizing subroutines in assembly language” that can be allegorically read as a very vague advice against such jumps:

confused

In total, this MR saves ~16 bytes of code for each of 5 functions it touches, and 35 LoC.

Edited by Rika

Merge request reports