[Refactor] Label reference count corrections
Summary
This merge request removes a side-effect in the compiler where reading the name
property of a TAsmLabel object would increase its reference count, which caused incorrect reference count values to appear in the peephole optimizers of some platforms, just because the compiler needed the label's name for something unrelated.
Some parts of the compiler depend on the reference count being increased like this; distinct calls to increfs
have been added to accommodate for this. As a result, compiler maintenance should be improved, code will be smaller and faster due to the stripping of dead labels that were incorrectly referenced (and the fact that the overridden TAsmLabel.GetName
method has been removed) and future peephole optimisations should be more accurate.
System
- Processor architecture: All, but AArch64 is notably affected
What is the current bug behavior?
Label reference counts are inflated, causing inefficient code as they cannot be stripped and nearby jumps optimised.
What is the behavior after applying this patch?
On AArch64 especially, label reference counts should be correct and code generation greatly improved.
Additional notes
As proven with assembly dumps when DEBUG_LABEL Is defined, reference counts for labels are too high in some situations, notably the following:
- When the
-a
option is specified to save assembly dumps, a label's reference count is increased every time its name is printed, causing it to be higher than expected by the time the label itself is printed. - AArch64 requires access to the label's name as part of its
a_jmp_always
routine and similar instructions. This causes a label reference to increase twice for each jump... once due to the retrieval of the label's name, and once as part of theloadref
routine.
Relevant logs and/or screenshots
Being a refactor, most platforms won't see any improvement in code generation, but AArch64 is affected significantly more due to the reasons mentioned above, and with this side-effect corrected, labels can be stripped and jumps optimised far better. For example, in the compiler's aopt unit under -O4 for aarch64-linux (Raspberry Pi OS) - before:
...
.Lj18:
mov x1,sp
ldr x0,[sp]
bl AOPTBASE$_$TAOPTBASE_$__$$_GETNEXTINSTRUCTION$TAI$TAI$$BOOLEAN
.Lj15:
ldr x1,[sp]
cbz x1,.Lj16
ldrb w0,[x1, #32]
cmp w0,#21
b.ne .Lj14
.Lj25:
ldrb w0,[x1, #40]
cmp w0,#2
b.ne .Lj14
.Lj26:
b .Lj16
.Lj23:
b .Lj14
.Lj22:
.Lj16:
ldr x0,[sp]
...
AFter:
...
.Lj18:
mov x1,sp
ldr x0,[sp]
bl AOPTBASE$_$TAOPTBASE_$__$$_GETNEXTINSTRUCTION$TAI$TAI$$BOOLEAN
.Lj15:
ldr x1,[sp]
cbz x1,.Lj22
ldrb w0,[x1, #32]
cmp w0,#21
b.ne .Lj14
ldrb w0,[x1, #40]
cmp w0,#2
b.ne .Lj14
.Lj22:
ldr x0,[sp]
...
Thanks to the labels now being correctly marked as dead (0 references), RemoveDeadCodeAfterJump
and CollapseZeroDistJump
can remove much more code.