Skip to content

[Refactor] Label reference count corrections

Summary

This merge request removes a side-effect in the compiler where reading the name property of a TAsmLabel object would increase its reference count, which caused incorrect reference count values to appear in the peephole optimizers of some platforms, just because the compiler needed the label's name for something unrelated.

Some parts of the compiler depend on the reference count being increased like this; distinct calls to increfs have been added to accommodate for this. As a result, compiler maintenance should be improved, code will be smaller and faster due to the stripping of dead labels that were incorrectly referenced (and the fact that the overridden TAsmLabel.GetName method has been removed) and future peephole optimisations should be more accurate.

System

  • Processor architecture: All, but AArch64 is notably affected

What is the current bug behavior?

Label reference counts are inflated, causing inefficient code as they cannot be stripped and nearby jumps optimised.

What is the behavior after applying this patch?

On AArch64 especially, label reference counts should be correct and code generation greatly improved.

Additional notes

As proven with assembly dumps when DEBUG_LABEL Is defined, reference counts for labels are too high in some situations, notably the following:

  • When the -a option is specified to save assembly dumps, a label's reference count is increased every time its name is printed, causing it to be higher than expected by the time the label itself is printed.
  • AArch64 requires access to the label's name as part of its a_jmp_always routine and similar instructions. This causes a label reference to increase twice for each jump... once due to the retrieval of the label's name, and once as part of the loadref routine.

Relevant logs and/or screenshots

Being a refactor, most platforms won't see any improvement in code generation, but AArch64 is affected significantly more due to the reasons mentioned above, and with this side-effect corrected, labels can be stripped and jumps optimised far better. For example, in the compiler's aopt unit under -O4 for aarch64-linux (Raspberry Pi OS) - before:

        ...
.Lj18:
	mov	x1,sp
	ldr	x0,[sp]
	bl	AOPTBASE$_$TAOPTBASE_$__$$_GETNEXTINSTRUCTION$TAI$TAI$$BOOLEAN
.Lj15:
	ldr	x1,[sp]
	cbz	x1,.Lj16
	ldrb	w0,[x1, #32]
	cmp	w0,#21
	b.ne	.Lj14
.Lj25:
	ldrb	w0,[x1, #40]
	cmp	w0,#2
	b.ne	.Lj14
.Lj26:
	b	.Lj16
.Lj23:
	b	.Lj14
.Lj22:
.Lj16:
	ldr	x0,[sp]
        ...

AFter:

        ...
.Lj18:
	mov	x1,sp
	ldr	x0,[sp]
	bl	AOPTBASE$_$TAOPTBASE_$__$$_GETNEXTINSTRUCTION$TAI$TAI$$BOOLEAN
.Lj15:
	ldr	x1,[sp]
	cbz	x1,.Lj22
	ldrb	w0,[x1, #32]
	cmp	w0,#21
	b.ne	.Lj14
	ldrb	w0,[x1, #40]
	cmp	w0,#2
	b.ne	.Lj14
.Lj22:
	ldr	x0,[sp]
        ...

Thanks to the labels now being correctly marked as dead (0 references), RemoveDeadCodeAfterJump and CollapseZeroDistJump can remove much more code.

Merge request reports