Refactor for scanner.pas and tokens.pas
(No clue what's that “merge request” thing all talking about, someday I'll read the book on Git you mentioned, I promise...)
Hereby I propose the following.
First, outline three repetitions in compiler/scanner.pas
. It should not be worse in the first place as the compiler doesn't bother to optimize re-reads, and other optimizations will come.
1-outline.patch
Second, change arraytokeninfo[tt].op
where NOTOKEN
means "unchanged" and forces you to branch, to arraytokeninfo[tt].to_op
that gives tt
instead of NOTOKEN
.
2-to_op.patch
Third, replace binary search narrowed by tokenidx
with a perfect hash. The only downside is that it may break if you add another word, but if this happens you can find another function by brute forcing for another set of multipliers, giving it more space, or making it analyze more characters.
3-perfhash.patch
Fourth, rename arraytokeninfo
to tokeninfo
and remove old tokeninfo
with its indirections.
4-tokeninfo.patch
An alternative function I was considering:
function PerfectTokenHash(const s: shortstring): uint32;
const
M1 = 1; M2 = 21; M3 = 3; M4 = 73;
var
ns: uint8;
begin
ns := length(s);
if ns = 0 then exit(0);
result :=
(int32(s[1]) * M1
+ int32(s[1 + ns div 4]) * M2
+ int32(s[1 + ns div 2]) * M3
+ int32(s[ns]) * M4
- ord('A') * (M1 + M2 + M3 + M4)) and 2047;
end;
But it analyzes only 4 characters instead of 8 and is thus more likely to require thorough replacement instead of a simpler tweaking.
Even if perfect hashing becomes troublesome one day, the perfect hash table can easily be altered into an ordinary one ¯\_(ツ)_/¯.
In fact, right now even linear search performs better than binary even in its worst case, that is, when trying to recognize an unknown 6-character word starting with S. Known and thus traversed by it are sealed
, static
, stored
, strict
, string
, and system
.