[x86] Unnecessary CMP and TEST instructions stripped
Summary
This merge request analyses the jumps that appear after CMP and TEST instructions a little bit closer to see if, when all is optimised, the result is a zero distance jump which is subsequently removed, as this means the original CMP/TEST instruction is unnecessary.
Additionally, the Cmp1Jl2Cmp0Jle
optimisation now catches the inverse case (the need for this optimisation became more prevalent due to the above optimisation).
Part refector, part bug fix, the CollapseZeroDistJump
method no longer deletes the label if it becomes dead or updates the register tracking. The latter point is so the function can be called safely on a tai entry that isn't the current instruction, and also removes awkward side-effect behaviour.
System
- Processor architecture: i386, x86_64
What is the current bug behavior?
N/A
What is the behavior after applying this patch?
- Unnecessary CMP and TEST instructions are removed.
- TEST instructions are merged more frequently (slightly emergent behaviour(!) but it's due to the jumps being optimised in a different order)
Additional notes.
- The
CollapseZeroDistJump
refactor was necessary to prevent phantom problems occurring due to the register tracking being updated and labels being removed at inconvenient times. - Some jumps may change from
jnb
tojnc
orjle
tojng
due to, again, the jumps being optimised in a different order. These changes are harmless though because they ultimately compile into the same machine code (the same flags are checked).
Relevant logs and/or screenshots
All logs under -O4, x86_64-win64.
For the simple example of removing a TEST
instruction - in aasmdata
- before:
.section .text.n_aasmdata$_$tasmdata_$__$$_defineasmsymbolbyclassbase$hfnakgxhwz_o,"ax"
...
.Lj104:
movb %r12b,45(%r14)
cmpb 44(%r14),%dil
je .Lj109
movq %r14,%rcx
call AASMBASE$_$TASMSYMBOL_$__$$_GETREFS$$LONGINT
testl %eax,%eax ; <- Completely unnecessary TEST call because the results of the flags don't get used
.Lj109:
movb %dil,44(%r14)
jmp .Lj111
...
After:
.section .text.n_aasmdata$_$tasmdata_$__$$_defineasmsymbolbyclassbase$hfnakgxhwz_o,"ax"
...
.Lj104:
movb %r12b,45(%r14)
cmpb 44(%r14),%dil
je .Lj109
movq %r14,%rcx
call AASMBASE$_$TASMSYMBOL_$__$$_GETREFS$$LONGINT
.Lj109:
movb %dil,44(%r14)
jmp .Lj111
...
In the aasmcpu
unit, thanks to a label not being marked as dead, a conditional branch is able to skip further ahead and avoid a 2nd conditional check that's deterministic - before:
.globl AASMCPU_$$_OPTIMIZE_REF$TREFERENCE$BOOLEAN
...
movb %dl,%sil
...
testb %dl,%dl
je .Lj953
call AASMCPU_$$_GET_DEFAULT_SEGMENT_OF_REF$TREFERENCE$$TREGISTER
cmpl 32(%rbx),%eax
je .Lj951
.Lj953:
testb %sil,%sil ; (%dl = %sil if we arrived here from je .Lj953, so jne .Lj950 will never branch)
jne .Lj950
movq %rbx,%rcx
call AASMCPU_$$_GET_DEFAULT_SEGMENT_OF_REF$TREFERENCE$$TREGISTER
...
After:
.globl AASMCPU_$$_OPTIMIZE_REF$TREFERENCE$BOOLEAN
...
movb %dl,%sil
...
testb %dl,%dl
je .Lj956
call AASMCPU_$$_GET_DEFAULT_SEGMENT_OF_REF$TREFERENCE$$TREGISTER
cmpl 32(%rbx),%eax
je .Lj951
testb %sil,%sil
jne .Lj950
.Lj956:
movq %rbx,%rcx
call AASMCPU_$$_GET_DEFAULT_SEGMENT_OF_REF$TREFERENCE$$TREGISTER
...
In aoptx86
, many TEST
instructions are merged and some small branches are removed - before:
.globl AOPTX86$_$TX86ASMOPTIMIZER_$__$$_REGREADBYINSTRUCTION$TREGISTER$TAI$$BOOLEAN
...
jmp .Lj303
.balign 16,0x90
.Lj463:
testb $8,9(%r12)
jne .Lj501
testb $32,9(%r12)
setneb %dil
jmp .Lj303
.Lj501:
movb $1,%dil
jmp .Lj303
.balign 16,0x90
.Lj464:
testb $32,8(%r12)
jne .Lj505
testb $8,9(%r12)
jne .Lj505
testb $32,9(%r12)
setneb %dil
jmp .Lj303
.Lj505:
...
After:
.globl AOPTX86$_$TX86ASMOPTIMIZER_$__$$_REGREADBYINSTRUCTION$TREGISTER$TAI$$BOOLEAN
...
jmp .Lj303
.balign 16,0x90
.Lj463:
testb $40,9(%r12)
setneb %dil
jmp .Lj303
.balign 16,0x90
.Lj464:
testb $32,8(%r12)
jne .Lj505
testb $40,9(%r12)
setneb %dil
jmp .Lj303
.Lj505:
...
This is also an interesting case of possible future optimisation because the last 3 instructions of the .Lj464
branch are identical to the .Lj463
branch and could theoretically be merged (although there may be some minor performance issues with unaligned jumps) - i.e:
.globl AOPTX86$_$TX86ASMOPTIMIZER_$__$$_REGREADBYINSTRUCTION$TREGISTER$TAI$$BOOLEAN
...
jmp .Lj303
.balign 16,0x90
.Lj464:
testb $32,8(%r12)
jne .Lj505
.Lj463:
testb $40,9(%r12)
setneb %dil
jmp .Lj303
.Lj505:
...